In general, search engines don’t like to copy content pages, they try to determine which one is the original version, and then ignore other copy pages.
There are two points worth noting:
1) The judgment of copying a web page does not have a ratio. For example, if 60% or 80% of the content on a web page is the same as other web pages, it is listed as a copy web page. If there is a ratio, it is much simpler.
2) Copying a web page does not bring penalties. Search engines will throw away other duplicate pages, but will not punish the original source that the search engine believes.
However, it is possible to punish the true original source. For example, the search engine judges the error, treats the original source as a copy, and treats the copy as the original source.
There are generally these possibilities for copying content pages:
1) The problem of URL normalization.
2) Agents and retailers’ websites often copy product information from the product manufacturer’s website. This is nothing wrong, the general product manufacturers also agree, but most agents, retailers, wholesalers will directly copy, without making changes. So these e-commerce sites are flooded with a lot of content pages.
3) Print version. Many websites offer a more printable version. If you don’t use a robot.txt file, these printed version pages may become duplicate pages.
4) The content of the web page is generated by RSS. There are a lot of websites, especially news websites, that use the RSS feeds of other websites to generate website content, which has appeared in the original source and many other websites.
5) The e-commerce website uses the Session ID. Search engine spiders are given different Session IDs when they visit web pages at different times, but in fact the web content is the same. However, due to the different parameters of the Session ID, it is treated as a different web page.
6) The content of the webpage is too small. Every page has inevitable common parts, such as navigation bars, copyright notices, and so on. If the body part of a web page is too small and the number is not enough for these general parts, it may be considered to be a copy of the content page.
7) Article plagiarism reprinted and so on. Sometimes other people copy the content of your website, sometimes it is reproduced in good faith, and sometimes the author voluntarily sends articles to different websites, which may result in copying the content page.
8) Mirror website. Mirrored websites used to be very popular. When a website is too busy and too slow, users can view content or download by replacing the image, which also has the risk of copying the content page.
9) The difference between product or service types is relatively small. For example, some websites classify their products or services by region, but the products or services that are actually provided to each region are the same. On these webpages classified by region, only the names of the places have been changed, and the rest of the content is the same.