To put it simply, Google ’s mechanism for crawling web pages has changed since the completion of the Dad’s data centre upgrade. Instead of spiders crawling web pages directly, a crawl caching proxy crawls the webpage, and different spiders get content from the cache, saving bandwidth.
A more complete translation of Matt Cutts’ posts can be found in the disillusioned and Xiaotian blogs.
What I want to add is: First: Matt Cutts specifically pointed out that this new crawling mechanism will not make your webpages crawl faster and will not affect the ranking. This crawl caching agent also does not affect the frequency and schedule of how each spider should crawl. It’s just that each spider doesn’t crawl the web directly, but instead gets it from the cache.
Second: What caught my attention is that Matt Cutts said that this crawl caching proxy is only available after the big dad update. Because it worked very well, Matt Cutts didn’t realize that the new mechanism was running before anyone else discovered it. This shows that Matt Cutts can’t keep up with the latest situation in all departments, so what else is Matt Cutts still not aware of?
Third: Matt Cutts said that the goal of this mechanism is to save bandwidth, rather than cloaked page detection. I think the extraneous sound is that with the same technology, Google can use other spiders to detect hidden pages. Of course, it may be that I am overly sensitive.
In addition, the number of pages that have been included in many websites has dropped sharply recently, and I suspect that it is related to the confusion caused by this new page crawling method. Obviously, it is not the ranking algorithm that causes the number of pages to be included to drop, but the problem of spider crawling.