Squid is a popular web cache proxy, used to limit bandwidth usage, and monitor web access. It is really powerful and widely used (at least by me, I didn't check the statistics :P). There are many squid log analyzers around here, but it seems no one ever bothered to do a Logwatch (a generic scriptable logs analyzer) filter for Squid. Yeah, I know: "Code it!"
This article is a first try, my squid filter will surely improve over time :)
How to write a logwatch filter
The files needed
First source of information: the Logwatch website. That's not the main site, because I have admins that forbid port 81... Ok, this is what you need to write the new filter (using /etc/logwatch as root for the new files) :
- conf/logfiles/squid.conf: the log files that will be parsed
- conf/services/squid.conf: this is where you define the configuration options of your script
- script/services/squid: the script to parse log files. This is where the work will be done
As always, Logwatch is really easy to use: you copy the files in your directory, you chose what log files will be analyzed, and you configure your filter.
What do you put in the files?
conf/logfiles/squid.conf
I don't know for you, but on my system, the squid logfiles are in /var/log/squid. Put this in your file:
########################################################
# Define log file group for squid #
########################################################
LogFile = squid/access.log
LogFile = squid/access.log.1
#LogFile = squid/cache.log
#LogFile = squid/cache.log.1
Archive = squid/access.log.*.gz
#Archive = squid/cache.log.*.gz
I know, I'm a lazy bastard, I didn't use the cache and store log. It's a first draft, I said... A few comments:
- access.log logs all the requests that go through Squid, and their informations (time, IP, found a file in the cache or not...)
- cache.log contains configuration information, errors encountered, etc. It may be useful to add support for this log in my script
conf/services/squid.conf
Here is the (simple) configuration file I use:
########################################################
# Configuration file for squid filter #
########################################################
Title = "squid"
LogFile = squid
I'm still a lazy bastard, so I didn't define a LogFormat option. I should really add it, the logformat of access.log is tunable in the squid configuration.
script/services/squid
And then, the big ugly script which filters the logs:
#!/usr/bin/perl
use strict;
use Logwatch ':all';
my $Debug = $ENV{'LOGWATCH_DEBUG'} || 0;
my $Detail = $ENV{'LOGWATCH_DETAIL_LEVEL'} || 0;
my $DebugCounter = 0;
if ( $Debug >= 5 )
{
print STDERR "\n\nDEBUG: Inside SquidCache Filter \n\n";
$DebugCounter = 1;
}
my %httpstatus = (
100 => 'Continue',
101 => 'Switching Protocols',
102 => 'Processing', # WebDAV
200 => 'OK',
201 => 'Created',
202 => 'Accepted',
203 => 'Non-Authoritative Information',
204 => 'No Content',
205 => 'Reset Content',
206 => 'Partial Content',
207 => 'Multi-Status', # WebDAV
300 => 'Multiple Choices',
301 => 'Moved Permanently',
302 => 'Found',
303 => 'See Other',
304 => 'Not Modified',
305 => 'Use Proxy',
307 => 'Temporary Redirect',
400 => 'Bad Request',
401 => 'Unauthorized',
402 => 'Payment Required',
403 => 'Forbidden',
404 => 'Not Found',
405 => 'Method Not Allowed',
406 => 'Not Acceptable',
407 => 'Proxy Authentication Required',
408 => 'Request Timeout',
409 => 'Conflict',
410 => 'Gone',
411 => 'Length Required',
412 => 'Precondition Failed',
413 => 'Request Entity Too Large',
414 => 'Request-URI Too Large',
415 => 'Unsupported Media Type',
416 => 'Request Range Not Satisfiable',
417 => 'Expectation Failed',
422 => 'Unprocessable Entity', # WebDAV
423 => 'Locked', # WebDAV
424 => 'Failed Dependency', # WebDAV
500 => 'Internal Server Error',
501 => 'Not Implemented',
502 => 'Bad Gateway',
503 => 'Service Unavailable',
504 => 'Gateway Timeout',
505 => 'HTTP Version Not Supported',
507 => 'Insufficient Storage', # WebDAV
);
#squidstatus=>{[nb,byte,time]}
my @squidresult = (
'TCP_HIT',
'TCP_MISS',
'TCP_REFRESH_HIT',
'TCP_REF_FAIL_HIT',
'TCP_REFRESH_MISS',
'TCP_CLIENT_REFRESH_MISS',
'TCP_IMS_HIT',
'TCP_SWAPFAIL_MISS',
'TCP_NEGATIVE_HIT',
'TCP_MEM_HIT',
'TCP_DENIED',
'TCP_OFFLINE_HIT',
'UDP_HIT',
'UDP_MISS',
'UDP_DENIED',
'UDP_INVALID',
'UDP_MISS_NOFETCH',
'NONE',
);
my %squidstatus;
my %BWbyIP = ();
my $i = 0;
while (defined(my $ThisLine = <STDIN>))
{
if ( $Debug >= 5 )
{
print STDERR "DEBUG($DebugCounter): $ThisLine";
$DebugCounter++;
}
chomp($ThisLine);
$ThisLine =~ /(\d+\.\d+)\s+(\d+) (\d+\.\d+\.\d+\.\d+) (\D+)\/(\d{3}) (\d+) (\D+) (\S+) - (\S+) (\S+)/;
my $b = localtime($1);
#print "time $b duration:$2 from $3 resultcode:$4 httpresult:$5 bytes:$6 method:$7 uri:$8 ident:$9 filetype: $10\n";
$BWbyIP{$3}+=$6;
$squidstatus{$4}[0]++;
$squidstatus{$4}[1]+=$6;
$squidstatus{$4}[2]+=$2;
$i++;
}
my @BWbyIPsorted = reverse sort { $BWbyIP{$a} <=> $BWbyIP{$b} } keys %BWbyIP;
for(my $j = 0; $j <= 20; $j++)
{
print "IP: $BWbyIPsorted[$j] => $BWbyIP{$BWbyIPsorted[$j]} bytes \n";
}
print "\nResult codes:\n\n";
foreach my $k (keys(%squidstatus))
{
print "$k : $squidstatus{$k}[0] reqs, $squidstatus{$k}[1] bytes, $squidstatus{$k}[0] time\n";
}
exit(0);
That's really simple (and badly written). The script here shows the bandwidth usage by IP and statistics on the result codes(WARNING: it may not be legal to use it on your network, I only show it as an example). See the output of this script:
################### Logwatch 7.3.1 (09/15/06) ####################
Processing Initiated: Mon Sep 22 20:01:08 2008
Date Range Processed: yesterday
( 2008-Sep-21 )
Period is day.
Detail Level of Output: 5
Type of Output: unformatted
Logfiles for Host: eclip4
##################################################################
squid Begin
IP: *.*.*.* => 8043309 bytes
IP: *.*.*.* => 6675804 bytes
IP: *.*.*.* => 2884992 bytes
IP: *.*.*.* => 668433 bytes
...
Result codes:
TCP_REFRESH_MISS : 177 reqs, 7007 bytes, 177 time
TCP_MISS : 731 reqs, 5487253 bytes, 6731 time
TCP_HIT : 547 reqs, 7656 bytes, 47 time
TCP_DENIED : 18 reqs, 67195 bytes, 18 time
TCP_REFRESH_HIT : 89 reqs, 277882 bytes, 589 time
TCP_MEM_HIT : 12 reqs, 2074 bytes, 812 time
TCP_CLIENT_REFRESH_MISS : 39 reqs, 66520 bytes, 39 time
TCP_IMS_HIT : 2078 reqs, 33792 bytes, 078 time
TCP_NEGATIVE_HIT : 82 reqs, 8064 bytes, 82 time
squid End
###################### Logwatch End #########################
The values displayed here are intented to be crap, don't worry, my proxy works fine! Now, if someone finds it interesting, I can improve this script, add features, etc. This was a really quick and dirty attempt to do a Logwatch filter for Squid. The base for future improvements is already here: I have something like a logformat, and tables corresponding to request types. Some features that could be useful: number of requests and bandwidth used by hour/day/month, most visited pages, cache efficiency...