Tags: cuill

06/27/08

Permalink 06:24:00 pm, Categories: SpamScam, Indexers , Tags: cuill, robots, twiceler

Originally published April 23, 2008: Updated and Republished May 18, 2008; Updated and Republished June 27, 2008:

UPDATE 06/27/2008:Twiceler is still behaving, entering the site at reasonable intervals by reading robot.txt; crawling like a spider—not an elephant; and has begun leaving helpful notes explaining its crawlers' intention and duration:

UPDATE 05/18/2008:Twiceler is better behaving, entering the site at reasonable intervals by reading robot.txt and exiting for an extended period on encountering the first 403 header return:

Cuill, a new Silicon Valley search engine start up, is running the rude, misbehaving, and rogue robot, Twiceler.

Twiceler is unregistered, undocumented, ignores robots.txt, and modifies its name variable {HTTP_USER_AGENT} in response to a regular expression blocking.

Cuill asserts Twiceler runs from IP address ranges:

  • 38.99.13.121-38.99.13.126
  • 38.99.44.101-38.99.44.106
  • 64.1.215.162-64.1.215.166
  • 208.36.144.6-208.36.144.10

It does not seem like a wise strategy for a start up search engine company (or anyone for that matter) to aggressively flaunt the directives of website administrators—particularly when your running an unregistered and undocumented (rogue) robot.

Some important factors in judging if a bot is SPAM:

  • Is bot registered with robottxt.org?
  • Is bot well documented and contact information provided?
  • Does bot read robot.txt file?
  • Does bot adhere to robot.txt file directives?
  • Is bot well behaved on site
  • Does bot use a reasonable crawl/index rate and times?
  • Is bot part of a university or student research project?
  • Does bot add or subtract transparency:opaqueness?
  • Does bot enable or disable, directly or indirectly, censorship:surveillance?
  • Is bot, nation-state; third party nation-state; private; corporate; ngo; personal?
  • Is bot for commercial gain?

Res:

January 2009
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Search

XML Feeds

powered by b2evolution free blog software