I used to have no knowledge of Chinese search engines. It was really ignorant. Seeing this article, the intuition is that if the two patents are similar or suspected of plagiarism, this involves the core interests of the two companies. For search engine optimizers like me, I can also understand some of the insider of Baidu algorithm . So I immediately went to see Li Yanhong’s patent application and read the Google PR patent application again.
My understanding is that the difference between the two patent applications is that they are talking about two things at the root. The problem to be solved and the implementation method are different, although the research objects are the same, they are all links.
The hyperlink analysis is to solve the correlation between files and search keywords. The method proposed by Li Yanhong is that in addition to the reference to the keywords in the file itself, it is also necessary to consider the keywords in the backlink.
Specifically, when a file is indexed into the database, it is recorded along with the hyperlink to the file and the anchor text (link text) used in each hyperlink. The search term also creates a database, and each word is recorded along with the hyperlink containing the word and the file pointed to by the hyperlink.
When a keyword is searched, the file or web page containing the most number of backlinks with the keyword as the link text will be ranked as the most relevant result.
This is the basic idea, and of course there are other variables to consider. For example, when a search string contains several words, each word is a dimension of the search vector.
I am more interested in the stemming technique already mentioned in this patent application, and the relevance of different text files is mentioned.
Google’s PR patents are clearly focused on issues that have not been resolved after the application of hyperlink analysis. In Google’s patent application, it is explicitly mentioned that even after considering the link to the file and the link anchor text, the search engine has important problems to be solved, that is, the links from different files have different weight values .
You can’t expect a link from the White House website to a link on a student’s personal website in a small town in Africa, with the same voting value. The Google Page Rank is a method to measure the importance of a page. The more links to a web page, and the higher the weight of the links themselves, the higher the importance and Page Rank of the targeted web page.
Google’s patent application gives specific details on how to calculate this importance, which they call the page level. Page-level calculations require multiple cycles of alternative calculations to get PR values that approximate the final result.
Google PR is not related to search terms, or to file relevance. It only describes the importance or status of a particular file.
As far as the performance of all current search engines is concerned, it is certain that both technologies have been applied in the ranking algorithm, although the names may be different. In particular, although Google has always used the PR flag, it rarely introduces other technologies. However, in the past one or two years, the Google algorithm has paid more and more attention to link analysis, anchor text analysis, and stemming technology.
The analysis made here is entirely for the technical problems revealed by the two patent applications. It is not intended to comment on who is the first, who is inspired by whom, etc.