Wednesday, April 30, 2008

Stemming the rise of the machines?

Wikipedia Checkuser Icon Use Wikipedia Logo and :Image:Gnome-searchtool.svg(upload by User:Seahen)The machines are reading
(c) Wikimedia Foundation
Sometimes events move faster than we plan.

Another tidbit of my visit to the Wikimedia Foundation office Monday was that I got a preview of this post by Cary Bass, which addresses robots.txt, the bit of magic that controls what is searched and what is not. (more specifically, well behaved robots/spiders honor what it says and do not scan pages it names off... Less well behaved robots/spiders eventually get blocked completely once their bad behaviour is discovered.) Specifically, it makes the case that non articlespace pages ought not to be quite as visible in searches as articles, because unlike articlespace pages, which were designed to be read by the intended audience of the encyclopedia itself, they often contain bickering, name calling, or even worse, "userified" pages that contain clear BLP violations, fringe theories given undue weight or all sorts of other problematic content.

Unbeknownst to Cary, after Cary had written that blog post but before it published (the wonders of delayed publishing) Newyorkbrad posted this to the english wikipedia and foundation mailing lists. In it, he advances a remarkably similar thesis, that we should act to avoid giving undue weight to material that really isn't intended for the readership.

BugzillaBugzilla mascot, Image
via Wikimedia Commons

Partially in support of this, I entered an enhancement request in Bugzilla for code changes to make it easier to control what is and isn't in robots.txt... your thoughts or comments on that bug would be appreciated.

See also this mailing list post by Jimmy Wales in which he expresses agreement with the general idea that we should use this as a way to avoid doing unnecessary harm. Bravo!

As it turns out, this topic (among others) has been getting considerable discussion at the dreaded Wikipedia Review, but I believe Brad advanced the idea because it's a good idea, not because of the pressure that some were trying to exert. More on that topic later, but for a taste of it you can review this thread.

No comments: