These days we use Amazon Cloudfront for content delivery. Amazon has made it very easy to deliver files in a Amazon Simple Storage Service (S3) bucket using Amazon Cloudfront distribution. If you are using Cloudfront as Content Delivery Network (CDN) your next task will be monitoring the usage. For this Amazon Cloudfront has a provision to store access logs to a S3 bucket. My hurdle was to process the log files stored by Cloudfront. For sites hosted with apache I use Awstats for reading the logs. So my vote was for awstats. Please follow the steps one by one 😉

1. Need to download the log files stored in the S3 bucket. For this I had to use the a python script done by wpstorm.net but I had to make some modification so that it worked for me. Please follow the blog post if you need any help setting up the required libraries.

get-aws-logs.py

#! /usr/bin/env python
"""Download and delete log files for AWS S3 / CloudFront
 
Usage: python get-aws-logs.py [options]
 
Options:
  -b ..., --bucket=...    AWS Bucket
  -p ..., --prefix=...    AWS Key Prefix
  -a ..., --access=...    AWS Access Key ID
  -s ..., --secret=...    AWS Secret Access Key
  -l ..., --local=...     Local Download Path
  -h, --help              Show this help
  -d                      Show debugging information while parsing
 
Examples:
  get-aws-logs.py -b eqxlogs
  get-aws-logs.py --bucket=eqxlogs
  get-aws-logs.py -p logs/cdn.example.com/
  get-aws-logs.py --prefix=logs/cdn.example.com/
 
This program requires the boto module for Python to be installed.
"""
 
__author__ = "Johan Steen (http://www.artstorm.net/)"
__version__ = "0.5.0"
__date__ = "28 Nov 2010"
 
import boto
import getopt
import sys, os
from boto.s3.key import Key
 
_debug = 0
 
class get_logs:
    """Download log files from the specified bucket and path and then delete them from the bucket.
    Uses: http://boto.s3.amazonaws.com/index.html
    """
    # Set default values
    AWS_BUCKET_NAME = '{AWS_BUCKET_NAME}'
    AWS_KEY_PREFIX = ''
    AWS_ACCESS_KEY_ID = '{AWS_ACCESS_KEY_ID}'
    AWS_SECRET_ACCESS_KEY = '{AWS_SECRET_ACCESS_KEY}'
    LOCAL_PATH = '/tmp'
    # Don't change below here
    s3_conn = None
    bucket = None
    bucket_list = None
 
    def __init__(self):
        s3_conn = None
        bucket_list = None
        bucket = None
 
    def start(self):
        """Connect, get file list, copy and delete the logs"""
        self.s3Connect()
        self.getList()
        self.copyFiles()
 
    def s3Connect(self):
        """Creates a S3 Connection Object"""
        self.s3_conn = boto.connect_s3(self.AWS_ACCESS_KEY_ID, self.AWS_SECRET_ACCESS_KEY)
 
    def getList(self):
        """Connects to the bucket and then gets a list of all keys available with the chosen prefix"""
        self.bucket = self.s3_conn.get_bucket(self.AWS_BUCKET_NAME)
        self.bucket_list = self.bucket.list(self.AWS_KEY_PREFIX)
 
    def copyFiles(self):
        """Creates a local folder if not already exists and then download all keys and deletes them from the bucket"""
        # Using makedirs as it's recursive
        if not os.path.exists(self.LOCAL_PATH):
            os.makedirs(self.LOCAL_PATH)
        for key_list in self.bucket_list:
            key = str(key_list.key)
            # Get the log filename (L[-1] can be used to access the last item in a list).
            filename = key.split('/')[-1]
            # check if file exists locally, if not: download it
            if not os.path.exists(self.LOCAL_PATH+filename):
                key_list.get_contents_to_filename(self.LOCAL_PATH+filename)
                print "Downloaded				"+filename
            # check so file is downloaded, if so: delete from bucket
            if os.path.exists(self.LOCAL_PATH+filename):
                key_list.copy(self.bucket,'archive/'+key_list.key)
                print "Moved					"+filename
                key_list.delete()
                print "Deleted					"+filename
 
def usage():
    print __doc__
 
def main(argv):
    try:
        opts, args = getopt.getopt(argv, "hb:p:l:a:s:d", ["help", "bucket=", "prefix=", "local=", "access=", "secret="])
    except getopt.GetoptError:
        usage()
        sys.exit(2)
    logs = get_logs()
    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit()
        elif opt == '-d':
            global _debug
            _debug = 1
        elif opt in ("-b", "--bucket"):
            logs.AWS_BUCKET_NAME = arg
        elif opt in ("-p", "--prefix"):
            logs.AWS_KEY_PREFIX = arg
        elif opt in ("-a", "--access"):
            logs.AWS_ACCESS_KEY_ID = arg
        elif opt in ("-s", "--secret"):
            logs.AWS_SECRET_ACCESS_KEY = arg
        elif opt in ("-l", "--local"):
            logs.LOCAL_PATH = arg
    logs.start()
 
if __name__ == "__main__":
    main(sys.argv[1:])

Note: The above script will download the s3 logs to specified folder. Please make sure you put your Amazon access keys.

2. Now we have bash script which will uses the above python script to download the log files and combine all of them into a single log file and then it will be analyzed by awstats.

Warning: Please read through the script files and make necessary changes needed.
Note: You should have awstats installed on your system. The bellow script uses awstats.
Note: You can download the script files at the end of this blog post where awstats configuration with custom setup for cloudfront log format is also provided.

get-aws-logs.sh

#!/bin/bash
# Initial, cron script to download and merge AWS logs
# 29/11 - 2010, Johan Steen
 
# 1. Setup variables
date=`date +%Y-%m-%d`
static_folder="/tmp/log_static_$date/"

mkdir -pv $static_folder 
python /var/www/scripts/get-aws-logs.py --prefix=logs/www.imthi.com --local=$static_folder
gunzip --quiet  ${static_folder}*
 
/usr/local/awstats/tools/logresolvemerge.pl ${static_folder}* | sed -r -e 's/([0-9]{4}-[0-9]{2}-[0-9]{2})\t([0-9]{2}:[0-9]{2}:[0-9]{2})/\1 \2/g'  >> /var/www/logs/www.imthi.com.log
 
rm -vrf $static_folder
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=imthi -update

I would suggest you to test run the above scripts on a staging / testing environment before moving to a production. Again please change the scripts with your domain details and Amazon access keys.

Download the scripts to download and process Amazon Cloudfront Logs with Awstats.

Have a nice journey exploring the cloud 😉