Calculating bandwidth from a combined-format web server log

Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.

I wrote my solution, calculate-data-transfer.py, in Python:

import re import sys

fileName = sys.argv[1]

compiledExpression = re.compile(".\".\" [-0-9]* ([0-9]*)")

fpFullLog = file(fileName)

totalBytes = 0

for line in fpFullLog: matches = compiledExpression.match(line)

if matches is None: continue

bytes = matches.group(1)

if len(bytes) > 0: # avoid zero-length matches bytes = int(bytes) totalBytes += bytes

fpFullLog.close()

print "%.2f MiB" % (totalBytes/2.0**20)

Use is simple:

% python calculate-data-transfer.py access.log

The script will print out the data transfer in MiB, based on the power of 2 (2^20) rather than 10 (10^6).

Topic: 

Like this article? Please support my writing! Flattr my blog (see my thoughts on Flattr), tip me via PayPal, or send me an item from my Amazon wish list.

Comments

Matt Michie's picture

It is scary when you google search something and come across people you know. You saved me 5 minutes of work ;)

Anonymous's picture

Also, it is much easier to so something like:

awk ‘{ sum += $10 } END { print sum }’ access_log

From the command line. You are guaranteed that *nix will have awk. Never know if you’ll have python.

Anonymous's picture

was it so easy? :))

i tried this script and saw that it is really working.

is bandwidth calculation so easy? :)

thanks!

snk's picture

If the access log file is of huge size, does this script works fine

Samat Jain's picture

I’ve used this script on logs as large as 8 GiB. It runs slower than I like (i.e. much slower than reading from disk directly; I haven’t cared to pinpoint the performance bottleneck), but it works completely fine otherwise.