This is one question that I am struggling with right now. I have seen a few discussions about Googlebot trying to crawl pages that don’t exist and I have seen it on some of the sites that I monitor. I have heard a few explanations for this but it never really concerned me too much but a recent situation has me wondering if Google actively seeks out duplicate content.
Two Sites with Similar Names
I am doing work for two sites with very similar names. Even though the names are similar they sell two totally different products and do not share any content. I was recently in GWT and noticed the one site had a fairly large number of 404 errors reported. When I took a quick look I noticed that all of the errors involved the Googlebot trying to crawl individual pages on the other site using the new site as the root domain. For example one site has a URL like this www.site1.com/top-product, the other site does not sell that product so therefore has no page with that name but Google still tried to call www.site2.com/top-product. Of course it was met with a 404 but this is where it gets weird.
Even though those pages never existed and Google was treated to proper 404 they were still indexed for some reason. This happened to a number of pages (probably around 20-30). It almost looks like Googlebot specifically set out to find out if these sites, owned by the same people, with similar domain names were providing duplicate content. It would make sense if that was being done and honestly it does not bother me because the sites are totally separate and offer different products but I am a bit concerned that a site: search showed the non-existent pages to be indexed.
Has anyone else seen this sort of activity?
Popularity: 13% [?]