By Samat Jain
May 25, 2006 - 3:03am
Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.
I wrote my solution, calculate-data-transfer.py, in Python:
import re
import sys
fileName = sys.argv[1]
compiledExpression = re.compile(".\".\" [-0-9]* ([0-9]*)")
fpFullLog = file(fileName)
totalBytes = 0
for line in fpFullLog: matches = compiledExpression.match(line)
if matches is None: continue
bytes = matches.group(1)
if len(bytes) > 0: # avoid zero-length matches bytes = int(bytes) totalBytes += bytes
fpFullLog.close()
print "%.2f MiB" % (totalBytes/2.0**20)
Use is simple:
% python calculate-data-transfer.py access.log
The script will print out the data transfer in MiB, based on the power of 2 (2^20) rather than 10 (10^6).
Like this article? Please support my writing! Flattr my blog (see my thoughts on Flattr), tip me via PayPal, or send me an item from my Amazon wish list.
Want to see more of my writing? Subscribe to
Samat Says' RSS feed







Comments
Permalink Matt Michie on June 4, 2007 - 12:14pm wrote…
It is scary when you google search something and come across people you know. You saved me 5 minutes of work ;)
Permalink Anonymous on June 26, 2007 - 12:38am wrote…
Also, it is much easier to so something like:
awk ‘{ sum += $10 } END { print sum }’ access_log
From the command line. You are guaranteed that *nix will have awk. Never know if you’ll have python.
Permalink Anonymous on July 24, 2007 - 9:55am wrote…
was it so easy? :))
i tried this script and saw that it is really working.
is bandwidth calculation so easy? :)
thanks!
Permalink snk on December 30, 2009 - 7:33am wrote…
If the access log file is of huge size, does this script works fine
Permalink Samat Jain on January 1, 2010 - 10:52pm wrote…
I’ve used this script on logs as large as 8 GiB. It runs slower than I like (i.e. much slower than reading from disk directly; I haven’t cared to pinpoint the performance bottleneck), but it works completely fine otherwise.