Calculating bandwidth from a combined-format web server log

Given a combined web server access log, such as the ones generated by Apache, it can be useful to know the total amount of data transfer of all requests in that log. This task is simple: extract the field listing the number of bytes sent for a request, and add them all up. For something so simple, there is an odd lack of examples or pre-made scripts that do this. Or, at least, I couldn’t find any.

I wrote my solution, calculate-data-transfer.py, in Python:

import re
import sys

fileName = sys.argv[1]

compiledExpression = re.compile(".*\".*\" [-0-9]* ([0-9]*)")

fpFullLog = file(fileName)

totalBytes = 0

for line in fpFullLog:
  matches = compiledExpression.match(line)

  if matches is None:
    continue

  bytes = matches.group(1)

  if len(bytes) > 0: # avoid zero-length matches
    bytes = int(bytes)
    totalBytes += bytes

fpFullLog.close()

print "%.2f MiB" % (totalBytes/2.0**20)

Use is simple:

% python calculate-data-transfer.py access.log

The script will print out the data transfer in MiB, based on the power of 2 (2^20) rather than 10 (10^6).

Comments

Anonymous Visitor's picture

If the access log file is of

If the access log file is of huge size, does this script works fine

Samat Jain's picture

Not that I'm aware of

I’ve used this script on logs as large as 8 GiB. It runs slower than I like (i.e. much slower than reading from disk directly; I haven’t cared to pinpoint the performance bottleneck), but it works completely fine otherwise.

Anonymous Visitor's picture

was it so easy? :)) i

was it so easy? :))

i tried this script and saw that it is really working.

is bandwidth calculation so easy? :)

thanks!

Anonymous Visitor's picture

your regex appears to be off.

Also, it is much easier to so something like:

awk ‘{ sum += $10 } END { print sum }’ access_log

From the command line. You are guaranteed that *nix will have awk. Never know if you’ll have python.

Anonymous Visitor's picture

mmmm google

It is scary when you google search something and come across people you know. You saved me 5 minutes of work ;)