Google New Page Rank (PR)

Google New PR: page level based on link distance

Google New PR: page level based on link distance

Google PR is the most well-known concept since the birth of Google, so many people think that the meaning of Google PR is more about public relations hype than ranking algorithm. In April 2016, Google canceled the display of the PR value of the toolbar. The webmasters no longer see the latest PR value. Many SEOs believe that Google completely canceled the PageRank algorithm. Is there a PageRank in the current Google ranking algorithm? There are different opinions.

Google has stopped using PageRank since 2006.

The day before yesterday, a self-proclaimed former Google software engineer revealed in a post on Hacker News that discussed Google’s alternatives that Google had stopped using Google PR since 2006:

The comments here that PageRank is Google’s secret sauce also aren’t really true – Google hasn’t used PageRank since 2006. The ones about the search & click through data being important are closer…

The meaning of translation is:

The commentary that PageRank is Google’s secret is not true – Google has stopped using PageRank since 2006. The argument that search and click-through rate data is important is more reliable…

To be clear, the Google algorithm has long since stopped using PageRank.

In addition, click-through rate is also one of the important ranking factors of Google’s algorithm, but so far, the real and effective Google clicker or Google Express has not yet been born, and Baidu seems to have given up treatment for Baidu . This is another A topic, not to mention.

Is this real?

In order to verify the identity of the former Google employee, SEOs have done a human analysis and concluded that the high probability is true. The employee’s name is Jonathan Tang, the Twitter account is here , and he worked at Google from 2009 to 2014. His Hacker News account was registered in 2007 and is unlikely to use such an old account to crap or rumor.

Moreover, John Mu, the official communicator of Google and the SEO industry, has not denied this comment on Twitter during these years, just saying:

SEOs should know that Google engineers have not been able to make changes to search for 20 years.

So, although there is no official certification, it should be true. In fact, Matt Cutts and others have long said that the Google algorithm is definitely considering links, and the link is still the most important ranking factor. The basic principles of PageRank and Google rankings have not changed, but it is impossible for them to remain as they are for so many years. With the original version of PageRank, it must have been modified.

JohnMu also directly commented on the use of PageRank last year, indicating that the current Google algorithm is not likely to use the original formula of PageRank as it is.

So what is the PR displayed by the toolbar after 2006?

As mentioned earlier, Google removed the toolbar to display PR from 2016. If PR is no longer used in 2006, what is the PageRank displayed on the 2006 to 2016 toolbar?

And another Google spokesperson, Gary Illyes, made a clear statement in 2017: “You don’t know, Google still uses PageRank (and hundreds of other signals) in the ranking algorithm after 18 years.”

So is it in use or not?

Jonathan Tang explained it later:

They replaced PR with another algorithm in 2006, and the algorithm gave results that were roughly similar to PR, but were much faster. The toolbar shows that the value of the PR is the result of this alternative algorithm. The name of this alternative algorithm is similar to PageRank, so Google claims that it can’t be technically wrong.

Therefore, since 2006, the toolbars used in the Google algorithm are not the result of the original PageRank calculation formula, but an algorithm with similar results, similar names, and much faster calculation speed.

Let’s call it Google’s new PageRank.

So what is the calculation principle of this new Google PR? Jonathan Tang didn’t say that even the real name didn’t say anything, everyone could only guess.

Suspected Google’s new PageRank patent

Behind Jonathan Tang’s post, Bill Slawski, who specializes in Google patents, sent a reply:

Google’s new version of the PageRank patent was passed in 2006. coincidence?

Bill Slawski posted a detailed introduction to this new version of the PageRank patent last year. I have read the original patent and Bill Slawski’s post carefully these two days . Here is a general introduction.

The patent name is Producing a ranking for pages using distances in a web-link graph – page level calculation based on link distance.

A Link Graph Structure of Web Page

Simply put, the new PageRank no longer calculates the total number of imported links, but calculates the distance between the page and the seed page. The closer the distance, the higher the page quality, the higher the page level and the new PageRank. This idea is very similar to Yahoo!’s TrustRank. The basic assumption is that good websites don’t link to bad websites, but they link to other good websites.

Seed page, link length, link distance

This patent covers several concepts. Seed Pages, Link Length, Link Distance.

Seed Pages

As shown in the simple network link diagram above, Google selects a part of the page as the seed page, as shown in the upper part of the page, 106, 108, 110, and the lower part is outside the seed page set, and needs to calculate a new PR. worth it.

A few key points about the seed page:

  • The seed page is clearly a high-quality page, and the examples in the patent are the Google directory (which is actually the dead open directory ) and the New York Times.
  • The seed page needs to have good connectivity to other non-seed pages, with more export links pointing to other high quality pages.
  • Seed pages need to be stable, reliable, and diverse, covering a wide range of topics.

Link length

There are some closeness between the seed and the non-seed, and some are far away. If the seed page 106 is directly connected to the non-seed page 112 via the link 132, the non-seed page 118 has no seed page directly connected to it, passing through two layers of links.

The link distance is not simply the number of link layers. Each link Google calculates a link length, which depends on the characteristics of the link itself and the characteristics of the page where the link is located, such as how many links are on the page, the location of the link, the font used to link the text, and so on.

So, the same is a link, the link length is not the same:

  • The more page export links, the longer the link length. This is the same as the original PageRank idea. The more export links, the less weight each link has.
  • The more important the location of the link, such as the body, the front part of the text, the shorter the link length.
  • The larger the link anchor text size, or in H1, the shorter the link length.

I remember that Matt Cutts mentioned a long time ago about the possible revision of PageRank. The probability that the links and footer links in the text are clicked by the user is obviously very different, so the PR and weight obtained by the links in different locations should be Different. This statement is in line with the meaning of this patent.

Link Distance

The link distance is the sum of the shortest link lengths between the page and the seed page collection. There is usually more than one link path between the seed page and the non-seed page. As shown in the figure, the page 118 may arrive from the seed page 106 through the links 132, 136, or may reach through the links 134, 142, 140, and may also pass through the links 134, 140. Arrival can also be reached from other seed pages through other links, all of which are from the seed set to the page link path, and the shortest link length is defined as the link distance.

If a page cannot be accessed from any seed page, that is, the seed page collection to this page has no link path at all, then the link distance is infinite.

Then the Google algorithm calculates the ranking ability score of a page based on the link distance, that is, the new PR value. In the last ranking algorithm, this new PR value is one of the ranking factors. In other words, the shorter the link distance, the closer to the seed, Google believes that the more important the page, the higher the ranking ability.

The calculation of the link distance does not require iteration, so it is much faster than the calculation of the original PageRank. On the importance of representing the page, I believe that Google has done comparisons and the accuracy is similar, so it is used to replace the original PR.

Reduced Link-Graph

Finally, the patent mentions another concept: Reduced Link-Graph, but it does not explain the use of this concept. The patent is finished with a paragraph saying that the concept of simplifying the network diagram is simplified. However, the simplified link network diagram may be related to link quality judgment, Penguin algorithm update, and the like.

In the previous diagram, all links between all pages make up a complete link network diagram, where the links consisting only of the shortest link distance paths are called simplified link network maps, that is, those links used to calculate new PR values. Obviously, the simplified link network map is a subset of the full link network, but the link distance of each page has been retained in the simplified link network map, and those links that are removed have no effect on the page link distance and the new PR value. In the simplified link network diagram, each page receives a link weight source that can be traced back to the nearest seed page.

If a page is reachable from the seed collection with no link path at all, that is, the link distance described above is infinite, the page will be excluded from the simplified link network diagram. If a page gets links from a simplified link network, although the total number of links may be large, the link distance is still infinite.

In other words, links outside the simplified link network are ignored, no matter how many links. One of the characteristics associated with the Penguin 4.0 algorithm update is that spam links are ignored and are not counted in the flow of links, which is very similar to the page level based on link distance.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button