Google seemingly leaked a treasure trove of technical search algorithm details by accident and now SEO people are getting real aggro

A Google search box with the query: how does google search work
(Image credit: Google)

Around 2,500 technical documents detailing the nuts and bolts of Google's ranking algorithms have apparently leaked. If the documents are real, it's an unprecedented look into the workings of the utterly dominant internet search engine. And one hell of an error, because it is stated that Google itself published the documents to GitHub before taking them down. But nothing published to the web disappears overnight, and the documents have been kept for posterity elsewhere.

This leak provides an interesting opportunity to compare the reality of how Google ranks its search results with the various claims the company has made about what has hitherto been largely a mysterious black box. The inner workings of Google Search have long been speculated upon but never really known outside of the company itself—or indeed inside the company by most Google employees.

The documents were shared with long-time SEO specialist Rand Fishkin by Erfan Azimi, an SEO advisor at EA Eagle Digital. Azimi says he shared the documents in the hope that they would reveal the "lies" propagated by Google in relation to its search platform.

That is obviously a very, very bold claim. Frankly, the documentation is incredibly dense and technical and covers a huge array of topics and systems. In really broad brush terms, it covers the type and character of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and much, much more. 

There are various areas where it's claimed that analysis of the documents throws up clear contradictions with Google's claims. For instance, in 2016 Google Search engineer Paul Haahr said that "using clicks directly in rankings would be a mistake."

But it's claimed the documents prove that Google uses a system known as NavBoost that directly incorporates various click count metrics into the page rankings and search results.

Other areas highlighted in contradiction to previous Google claims include the use of Domain Authority, sandboxing new websites while more data is collected, including user data collected from the Chrome web browser and more.

If these claims are all true, it's hard to be clear how much of this comes down to Google simply wanting to protect its search IP from potential competitors and how much can be chalked up to more cynical or even sinister motives.

Your next machine

Gaming PC group shot

(Image credit: Future)

Best gaming PC: The top pre-built machines.
Best gaming laptop: Great devices for mobile gaming.

Moreover, as far as we can tell the documents do not actually reveal exactly how Google currently ranks pages. In other words, it does not appear that this leak will make it straight forward to optimise a web page to improve Google search ranking, which is what a lot of observers would presumably have been praying for.

But if the documents are real, and the claims being made about the implications contained therein are broadly accurate, at minimum Google has a pretty major scandal on its hands in terms of the statements it has made in the past and its corporate credibility and ethics.

For now, that's a pretty big "if". This is a story that won't be resolved overnight. As far as we are aware, Google has yet to comment whether the documents are real let alone provide a riposte to the main critiques that have followed.

No doubt Google is formulating a detailed response as we write these very words. But we have a feeling that won't be the end of it and the full fall out from this alleged scandal will be measured in months if not years.

Jeremy Laird
Hardware writer

Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.