Thomas is the Library of

Thomas is the Library of Congress’s legaslative Website. Here, you will find the full text of every law and bill. Go ahead, search for the text of a law at Google, and it should return a link to Thomas.

But it doesn’t.

Google doesn’t search Thomas, neither does any other search engine. They stay away from Thomas because the Webmaster there has created a robots.txt file — a file that every search engine’s spider checks to see what parts of the Website it should visit — that reads like this:

 User-agent: * Disallow: / 

Here’s the human translation: All search engines, get lost.

There’s no earthly reason to exclude search engines from Thomas’s guts. Thomas’s pages have static URLs that can be easily indexed, and Thomas’s maintainers keep search-engine friendly directories of laws. US Law is free from copyright, so they can’ t be worried about copies being made. Thomas isn’t supported by banner-ads, so sending visitors directly to the appropriate page isn’t a problem — if anything, it cuts down on bandwidth costs.

I may be paranoid, but it seems to me that there’s only one reason to keep search engines away from Thomas: To keep people away from Thomas.

Here’s a project that was created for the purpose of democratizing the law, putting it in the public eye. But instead of shouting Thomas’s presence from the hills, its Webmaster has ensured that Thomas will be consigned to the most obscure corner of the Internet, known only to civil-rights cranks and Beltway insider-nerds. What’s your theory? Link Discuss