Even now, 6 months after transferring my blog from http://users.chariot.net.au/~jktaheny/blogger/blog.htm to http://taheny.com, Google still ranks my old website address above this one for most searches. It is annoying, as I want visitors to come here and read ‘Joe’s up to date Ramblings’.
After some contemplation, today I added a Robots.txt file to my old website. In theory, the following piece of script should stop all search engine robots visiting the old blog:
User-agent: *
Disallow: /
If the text works and the search robots (including Google) don’t visit my old blog, I hope and expect it to drop out of the search rankings to be replaced with my current blog. We will see.
When looking for Robots.txt advice, I came across the Whitehouse’s Robots.txt page with over 2000 lines of text. This many lines in a Robots.txt is not uncommon for large websites (whitehouse.gov has over 600,000 pages).
What I did find strange was almost all the pages contained in the text were Iraq-related. Here is a random screenshot:
I know Iraq has been a major issue, but surely, most of the Whitehouse web pages are not Iraq-related. If this is the case, why does the Whitehouse not want many Iraq pages spidered? Are they embarrassed by the mess they have made?
UPDATE: The Whitehouse/Robots.txt/Iraq issue has been covered many times before.