The IDcide Affair
Preface
This is a public account of an affair between Dash Space Inc. of Vancouver, British Columbia, Canada and Google Inc. of Mountain View, California. The following contains documentation of events regarding the way Google handled a domain owned by Dash Space as well as statements regarding those events.
Account of Events
Google places a "block" on the website
Sometime around May 27th, 2005 the rate of user referral from Google's search engine to idcide.com has started to slow until it has stopped completely. All pages from the site that have previously been included in the Google index have "disappeared".
On Saturday, May 28th, 2005 we contacted Google about the disappearance of the site and got an automated e-mail reply that suggested we perform a search for www.idcide.com to determine if the site is still included.
A Google search of the domain www.idcide.com yielded the following statement:
Sorry, no information is available for the URL www.idcide.com
This, per Google's e-mail, meant the site was not included in the index.
On May 29th we asked Google to advise. On June 14th, after not hearing anything for two weeks, we contacted Google again.
On June 28th we received the following reply from Google:
Thank you for your note. We apologize for our delayed response. Your page has been blocked from our index because it does not meet the quality standards necessary to assign accurate PageRank.
The same day we replied to Google's e-mail with a request to reconsider. Google replied:
We have sent your request on to our engineering team who will review your site to determine if it is eligible for re-inclusion.
That was the last we heard from Google for more than a year.
Website being used by Google
Though blocked from the Google index, Google's interest in idcide.com has not subsided. Google's crawler software has continued to retrieve data from idcide.com frequently and repeatedly. In August 2005, for example, Google's software made of 175,000 requests for URLs on idcide.com and has used 1.5 GB of bandwidth. In September 297,000 requests were made and 2.5 GB of bandwidth were used.
Sometime during the month of May 2006, about a year after the site has been blocked, it became apparent that Google was not only retrieving data from idcide.com but also using it.
Data retrieved from idcide.com, and links referencing idcide.com begun to appear as answers for the Google Q&A feature (sometimes referred to as OneBox) and have continued till early November, 2006.
An example of that usage by Google was a Google search for:
% of African American in Los Angeles
which resulted in the following: [Screen Capture]
At the top of the page was a Google Q&A answer - 11% - based on data retrieved from the following page on idcide.com: http://www.idcide.com/citydata/ca/los-angeles.htm
Using idcide.com pages within the Q&A feature did nothing to change Google's decision to block the site from its main results. Google also continued to claim it had no pages from the site in the index when it obviously had.
On July 7th, 2006 the situation was reviewed by Barry Schwartz of Search Engine Roundtable[sc] who wrote:
"Google pulls answers to questions from sites it indexes. But we have one documented case of a site that is not included in the index, but does return results for the Q&A style searches."
"But if conduct a site command search for the site at Google you get no returned results. Same thing if you try just searching on the domain name, "Sorry, no information is available for the URL idcide.com."
On July 7th 2006 Matt Cutts made comments regarding idcide.com[sc] on his blog:
"I can't speak to why that domain is highlighted in a "one box," but the part that I noticed was that back in November 2004, this was a completely different company". "Now someone else is using that same domain, and has ~40,000 hotel pages in Yahoo."
On July 10th, 2006 we contacted Google and requested that they remove the block from the domain as it became obvious any quality claims made a year earlier had no merit. We received no response.
We requested again, on July 26th, that Google cease intervention and interference in site's natural ranking within the main results. Again we received no response to this request.
Block removed and ranking intervention introduced
On October 04, 2006 Ionut Alex. Chitu of the Google Operating System blog[sc] published a link to a video in which Peter Norvig, the Director of Research at Google Inc. and previous Director of Search Quality, explains the creation process of answers for the Q&A feature.
He wrote:
"One example of project where Google uses a lot of data is Google Q&A, that is extracting facts from web pages and delivering as answers to common questions like "what is the population of Japan?". Google doesn't use predefined patterns, they find the patterns from examples, as this approach is more scalable. They extract data by matching the patterns against the top results for a query."
On October 6th, 2006 the block placed on idcide.com and which lasted almost 500 days has suddenly been lifted. Pages that seemingly didn't exist before in the Google index, appeared instantly.
However, all pages from the domain idcide.com now ranked extremely low compared to their rank before the block was introduced. They also ranked low compared to what a reasonable person would have considered reasonable ranking.
This was initially seen by performing a Google search for the very specific phrase:
"Los Angeles's population has grown by about 6%" (with quotes)
which resulted in the following: [Screen Capture 1] [Screen Capture 2]
This very specific phrase appeared on a page from which Google extracted Q&A answers, hence, one Google considers to be of good quality (http://www.idcide.com/citydata/ca/los-angeles.htm). At the time, Google had in its database only data from sites that scraped/copied the matching text from idcide.com. Google ranked those above the idcide.com original, and put the original in an "omitted results" status.
On October 12th, 2006 we made a draft of this public account available to Google for comments. Google replied on October 17th, 2006:
Due to the tremendous number of requests we receive, we're unable to personally respond to your email.
On October 23rd, 2006 we publicly published the IDcide Affair account.
Ranking intervention changed
Sometime around November 14th, 2006 Google stopped using idcide.com in the Q&A feature.
During the last months of 2006 and early in 2007 it became clearer that Google's intervention in the ranking of pages from the domain idcide.com is all-encompassing. No page from the domain could rank in what a reasonable person would have considered reasonable ranking.
This was easily demonstrated by performing a Google search for the very specific phrase:
"IDcide Affair" (with quotes)
which resulted in the following: [Screen Capture]
This very specific phrase never existed on the web prior to the publication of the IDcide Affair. At the time, this page was the only page about the topic. All other web pages containing the phrase were either referencing the page (http://www.idcide.com/affair/), or been referenced by it. Yet, Google ranked those above the IDcide Affair page.
On January 9th, 2007 the above example was posted publicly in forums. Sometime between January 9th and January 11th Google made a significant change to the way pages from idcide.com rank.
In most cases Google intervention has ceased. For most search queries, most pages would rank in a way that a reasonable person would consider reasonable ranking. This includes all above mentioned examples.
Yet, this was not true for all queries or all pages. As of March 2007 it seemed that ranking intervention was still in place as far as any travel related search terms are concerned.
Ranking intervention weakened or eliminated
Sometime in early December 2007 Google has made changes to the way it ranks pages from idcide.com. The ranking intervention for travel related search terms is no longer easily detected suggesting that it has either weakened or eliminated.
More than two and a half years have passed, and at last, Google has let the site be.
Statements regarding the events
We believe that Google's claims in the e-mail dating June 28th, 2005 were intentionally misleading. There was no, and there isn't any, "quality" problem with the site. Google had, and still has, no problem assigning accurate PageRank to its pages.
From May 27th 2005 and up until September 2006, Google was replying to site and domain searches with provably false statements of fact. Google claimed such searches matched no documents, when in fact Google was holding full copies of these documents in its cache database.
We believe that we have never violated any of Google's published Webmaster Guidelines. Further more, we have not made any significant change to the website from early 2005 till early 2007. Whatever characteristics the site had prior to being blocked, it still had while being blocked. Whatever characteristics it had while being blocked, it still has now after the block has been removed. We thus believe whatever prompted Google to block and then unblock idcide.com was not concern about the quality of the site but rather Google's own internal interests.
Matt Cutts describes himself in his own blog[sc] in this way:
"I'm one of several Googlers who answer questions online and sometimes for the press. I usually handle questions about webmasters or SEO, so in those areas I'm more likely to make sense and less likely to say something stupid."
"This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.
We respect Matt's disclaimer and have no claim against him. That does not diminish the value of his opinion, which we think hints as to what triggered idcide.com blocking by Google more than a year beforehand.
The statements made by Matt Cutts are indeed true. Yahoo indeed lists most pages of the site. The domain has indeed changed ownership in 2005 accompanied by a WHOIS information update as required by ICANN regulations. As a matter of stating the obvious, we would like to point out that Google Webmaster Guidelines do not define the number of pages, the type of content or the ownership duration that will result in a site block.
We believe Google has a process by which it monitors the results of its own ranking algorithms for specifically chosen keyword combinations. This process provides Google with an indication as to what keyword combinations send how many referrals to what domains.
Between 2005 and 2007 most of idcide.com competitors were funded by presenting Google AdSense advertising. At that time we chose to fund idcide.com by affiliation agreements with travel companies. We have integrated travel related pages into the site in a logical and straightforward manner. The number of such pages at idcide.com is very small when compared to what Google collects from other sites. (For example: 8 million pages from travel.yahoo.com[sc]).
We believe idcide.com was blocked because Google's monitoring process found the domain was receiving "too many" travel related referrals compared to what it believed should be the "right" distribution of such. The site wouldn't have been blocked, had it not come up in searches for travel related keywords.
By deduction, had we chose to use the Google AdSense program instead of affiliation with travel companies - idcide.com wouldn't have been blocked.
Hence we believe that Google used a process to block idcide.com when it knew it is interfering with idcide.com business, and that such interference contributes to Google's own business.
Our analysis shows that prior to removing the block from idcide.com, Google has intentionally modified the data it holds about idcide.com or the way it processes that data with the intended result of ensuring pages from idcide.com do not regain their rank from prior to the introduction of the block. Thus, Google again intentionally changed their search results to limit user referrals from Google's search engine to idcide.com
Based on how events unfolded we have reason to believe that in some cases Google has been modifying data not as an attempt to correct previous wrongdoing, but rather, in order to obscure previous actions, make the idcide.com case harder for the average person to understand and divert any claims we may have against them.
We believe that while previous events surrounding the domain idcide.com were fairly unique, the nature of intervention in ranking was not. We believe there was an entire class of domain owners whose domains were treated the same way but are not aware of it.
We hope that the changes introduced late in 2007 signal not just a change in algorithms but also a change in heart. And that Google has finally understood that "with great power there must also come great responsibility."
First Published: October 23th, 2006
Last updated: January 9th, 2008