How can I get the spider’s details in the background with a PHP script? Such as where the spider / when came / searched which pages / which it likes which do not like, and judge the browser is using the same technology? Thank you, I am bothering you again.
Just about the traffic statistics and analysis has not been discussed before, these two days are specifically talked about in several aspects. Let’s talk about how to look at the server raw logfile (server raw logfile) today.
The web server will automatically record some information about each visitor’s visit and it will be stored in the server’s original log file.
General hosting provider will give you download the file in the Control Panel if your hosting provider does not provide the raw log file, you should change place.
The original log file is just a plain text file, just open it with a text editing software such as WordPad or Notepad.
Below is a line I randomly selected from the log file of the blog itself last month. Let’s take a look at what information it includes:
220.127.116.11 – – [02/Jul/2006: 15:30:41 +0800] “GET /seoblog/2006/04/17/user-friendly-website/ HTTP/1.1” 200 19031 “http://www. Baidu.com/s?
wd=PRADA%B9%D9%B7%BD%CD%F8%D5%BE&cl=3” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Alexa Toolbar)”
User IP address
This is the IP address used by the visitor to tell you where the visitor came from. If you check the location of this IP address, you can see that this visitor is from Beijing, China.
This is the time when a file is accessed. Combined with the IP address, you can track the order in which a particular user is accessed from one page to another.
This number is the difference in time zone relative to Greenwich Mean Time.
GET /seoblog/2006/04/17/user-friendly-website/ HTTP/1.1
The action the server does is either GET or POSP. In addition to some CGI scripts, it should be usually GET, which is to get a web page or image file from the server.
The line in the example means to get the file /seoblog/2006/04/17/user-friendly-website/ according to the HTTP/1.1 protocol.
Return status code
The next line is whether the server response was successful. 200 means that the file was successfully obtained. If it is 404, the file is not found. 401 is a password, 403 is forbidden, 500 server error, and of course there are many other codes.
This line refers to the size of the file obtained, which is 19,031 bytes in our example.
This paragraph tells us where the visitor came from this page. It may be another page of the same website, possibly a search page from a search engine.
The example is from Baidu, the search keyword is “PRADA official website”.
This information is very important.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Alexa Toolbar)
This last paragraph represents some information about the browser and the user’s computer.
For example, the above paragraph indicates that the computer used by the user is a Netscape-compatible Mozilla browser, Windows NT operating system, the browser is IE6.0, and the Alexa Toolbar is installed.
In this section, if the user is using another type of computer or browser, you may also see the code:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0 Maxthon; Alexa Toolbar)
http://www.gougou.com RSS Online Reader
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
The meaning does not need to be explained, and you will know it at a glance.
Track user tracks
This is where many website operators need to study.
Exclude access to the image from the log file, remove the information of other visitors inserted in the middle, and only list the pages viewed by users from a certain IP address for a period of time, we can see that the user is in you. What actions have been taken on the website and which pages have been read.
User behavior information is a great help to website operators.