Matt Cutts recent video on why pages disallowed by Robots.txt still appear in Google’s index is the latest display of Google’s arrogance. When you break it down he is saying that Google doesn’t care what you tell them, they are going to do what they want with your site. To get them to follow your Robots.txt directives you will have to jump through some additional hoops.
Rationalizing with Technicalities
I know that he is technically correct in that Robots.txt only tells crawlers what not to crawl and Google is not actually crawling the blocked pages. Matt explains that they are not actually “crawling” the blocked pages but instead are simply indexing them without actually seeing them. This brings up another issue related to search quality but I digress. The problem is that most webmasters place pages in their Robots.txt because they don’t want them crawled or indexed. By indexing blocked pages Google is violating the webmaster’s trust and intentions for the site. Google can essentially destroy a business for violating its guidelines but they have no issues violating the guidelines set by webmasters on how robots interact with their site.
Adhering to Google’s Whims
Obviously if you want Google traffic you have to play the game. Some may be willing to go the cat and mouse blackhat route while most will simply try to follow Google’s ever changing rules and whims. That also means that you have to watch out for Google deciding to ignore how you want your site to be accessed. So if you have your Robots.txt set to block certain pages make sure you add the Robots Meta Tag and set it to Noindex so Google does not violate your wishes by indexing blocked pages. Who knows how long they will actually follow the Robots Meta Tag directives but they claim to do it now. That could change tomorrow or even be patently false right now as Mr. Cutts has been known to say one thing when the reality is completely different. And always remember, Google knows better than you and if you just do what they tell you to do everything will be OK.
Related posts:





{ 2 comments… read them below or add one }
Hi there,
Great approach. Yes,you are exactly right. Yes,i have seen this video. he explained about dmv.ca.gov site. They have blocked few pages with nofollow tag.But still google is indexing those pages…
Even i am facing with same problem. Few of my pages in joomla are not supposed to index.But still google is indexing..and my webmaster tool is showing many errors
you know how to solve such probs?
The easiest thing to do is simply apply the
meta tag to the pages you don’t want indexed. As far as the errors I can’t be sure without actually looking at the errors and your site to see what the deal is.