Most interested is Alan Eustace, vice president of Google’s technology department, about Google’s rankings. So I only jumped to this part and looked at it for about half an hour. The following key points to remember are introduced to everyone.
Alan first talked about spider crawling, basically tracking the links found.
Speed is very important. The last time Google posted on the homepage was that there were 8 billion pages in the database. If you crawl a web page every second, the 8 billion web page will take more than 250 years to crawl. So crawling web pages at high speed is very important.
Another problem with crawling is that about 50% of web pages are duplicates. So the 8 billion that Google has collected is actually the number that has been lost half of it. The real crawl is probably 20 billion files.
Another danger when crawling is the infinite loop. For example, to track a link to a calendar on a web page, there may always be a “next month” button on this calendar, resulting in an infinite loop. Because web applications can generate unlimited monthly calendars for the next month, Google spiders will not come out.
10% to 20% of websites update content every month, and updates are also important.
Then there is the index, which is similar to the index of a book. For example, if the user searches for a heart attack, heart appears in files 5, 9, and 25, and attack appears in files 7, 9, 22, then it is obvious that the ninth page is a qualified page, so the search scope is already Restricted to pages that contain search terms.
Of course, this number is still huge. How can you pick the best result from this? That is to further calculate the ranking.
Google uses more than two hundred signals to calculate rankings, which is new information.
Alan mentioned the anchor text and Page Rank.
As an example of the Stanford University website, many other websites will connect to the website of Stanford University, so it can be reasonably assumed that the website of Stanford University is highly authoritative, which is the meaning of PR.
Then there is the anchor text, which is the link text. For example, on the website of Stanford University, a link with “Knight fellows” as the link text to other websites, then the link text itself explains the theme of the page to a certain extent, even on the page, the word Knight fellows It did not appear at all.
This ranking process must be automatic and it is not possible to manually adjust these rankings.
Alan said that about 20% to 25% of search keywords are unique, meaning that they have never been searched before. Users are very creative and will search in a variety of forms.
A keyword search is handled by thousands of machines.
Alan went on to talk about junk technology, such as leaving spam links in the guestbook. This technology has long been ineffective for Google. For example, some cheating people build a large number of websites, and links between these websites, which does not work for Google.
He added: If it doesn’t work for other search engines, then I don’t know.
When Google makes algorithm adjustments, it depends on how many people benefit. For example, to make an adjustment, 40% of people will think that the search quality is improved, 40% think that there is no change, and 20% think that the search quality is degraded, and Google’s decision will be to make this adjustment.
It is impossible to meet 100% of people under 100% of the search terms.
Alan also mentioned that searching in many other languages is also very difficult, such as Chinese word segmentation technology.
This conversation is not surprising, but Google’s vice president of technology talks about Google’s ranking algorithm.
Wait until you have time to read it again and see what’s new.