Using $upstream_cache_status in access.log

Nginx has a variable $upstream_cache_status. This can be used for any upstream. We are using it mainly for our WordPress + fastcgi_cache.

Define a new log_format in nginx.conf:

log_format rt_cache '$remote_addr - $upstream_cache_status [$time_local]  '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

Then in server block:

Either replace lines like:

access_log   /var/log/nginx/example.com.access.log;

With line:

access_log   /var/log/nginx/example.com.access.log rt_cache;

OR you can use multiple access logs like below:

access_log   /var/log/nginx/example.com.access.log;
access_log   /var/log/nginx/example.com.cache.log rt_cache;

Please note when using multiple logs, use separate file names.

Once you make above changes, reload nginx config and wait for log files to fill up!

Analysing cache efficiency:

HIT vs MISS vs etc

Run a command like below on your access.log file:

awk '{print $3}' access.log  | sort | uniq -c | sort -r

Sample output:

    800 HIT
    779 -
    392 BYPASS
     19 EXPIRED
     14 MISS

Note: dash (“-“) means request never reached to upstream module. Most likely it means return 403, return 444, and so on.

MISS Request URLs

awk '($3 ~ /MISS/)'  access.log | awk '{print $7}' | sort | uniq -c | sort -r

BYPASS Requests URLs

awk '($3 ~ /BYPASS/)'  access.log | awk '{print $7}' | sort | uniq -c | sort -r

MISS v/s BYPASS

MISS occurs when a pattern is configured to cache but at the time of request was not cached. In correct configuration, subsequent requests will be served from cache based on caching duration other parameters.

BYPASS occurs when a pattern was explicitly configured NOT to use cache. e.g. skipping cache for logged in user. Subsequent requests will also be bypassed.

Recommendations:

  1. If MISS are frequent, consider increasing cache size and/or duration.
  2. For BYPASS, analyse patterns. Many times we set requests with query strings to BYPASS cache. On high traffic site, it will be wise to cache output of certain requests with query strings, even that query_string alters outcome of pages. Be careful to not cache wp_nonce and auth_tokens. I will post more about this soon!

Must Read: Useful Commands to parse access.log