Access logs
We are using following format, which is also default nginx format named “combined”:
$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"
Explanation of fields is as follows:
- $remote_addr – IP from which request was made
- $remote_user – HTTP Authenticated User. This will be blank for most apps as modern apps do not use HTTP-based authentication.
- [$time_local] – timestamp as per server timezone
- “$request” – HTTP request type GET, POST, etc + requested path without args + HTTP protocol version
- $status – HTTP response code from server
- $body_bytes_sent – size of server response in bytes
- “$http_referer” – Referral URL (if present)
- “$http_user_agent” – User agent as seen by server
Lets explore some commands which can help us analyse logs.
Sort access by Response Codes
cat access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn
Sample Output:
210433 200
38587 302
17571 304
4544 502
2616 499
1144 500
706 404
355 504
355 301
252 000
9 403
6 206
2 408
2 400
Same thing can be done using awk
:
awk '{print $9}' access.log | sort | uniq -c | sort -rn
Sample Output:
210489 200
38596 302
17572 304
4544 502
2616 499
1144 500
706 404
355 504
355 301
252 000
9 403
6 206
2 408
2 400
As you can see it log says more than 700 requests were returned 404!
Lets find out which links are broken now?
Following will search for requests which resulted in 404 response and then sort them by number of requests per URL. You will get most visited 404 pages.
awk '($9 ~ /404/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn
For easyengine, use instead:
awk '($8 ~ /404/)' access.log | awk '{print $8}' | sort | uniq -c | sort -rn
Sample Output (truncated):
21 /members/katrinakp/activity/2338/
19 /blogger-to-wordpress/robots.txt
14 /rtpanel/robots.txt
Similarly, for 502 (bad-gateway) we can run following command:
awk '($9 ~ /502/)' access.log | awk '{print $7}' | sort | uniq -c | sort -r
Sample Output (truncated):
728 /wp-admin/install.php
466 /
146 /videos/
130 /wp-login.php
Who are requesting broken links (or URLs resulting in 502)
awk -F\" '($2 ~ "/wp-admin/install.php"){print $1}' access.log | awk '{print $1}' | sort | uniq -c | sort -r
Sample Output:
14 50.133.11.248
12 97.106.26.244
11 108.247.254.37
10 173.22.165.123
404 for php files – mostly hacking attempts
awk '($9 ~ /404/)' access.log | awk -F\" '($2 ~ "^GET .*\.php")' | awk '{print $7}' | sort | uniq -c | sort -r | head -n 20
Most requested URLs
Most requested URLs
awk -F\" '{print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -r
Most requested URLs containing XYZ
awk -F\" '($2 ~ "ref"){print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -r
Useful: Tweaking fastcgi-buffers using access logs
Recommended Reading: http://www.the-art-of-web.com/system/logs/ – explains log parsing very nicely.