Analyze hourly apache log traffic

From snippet wiki
Jump to navigation Jump to search

If you are running several virtual hosts under one IP you might get a network traffic peak without knowing the real reason. When your local web statistic software doesn't tell you enough, than have a look into the access log files using this php command line script:

define('STARTDIR', '/var/log/apache2/');

// Finde aktuelle logfiles
$aDirs = array();
$hDir = opendir(STARTDIR);
while (false !== ($entry = readdir($hDir))) {
        if(('.'==$entry) || ('..'==$entry)) continue;

        if(is_dir(STARTDIR.$entry)) {
                $aDirs[] = $entry;
        } // if
} // while
closedir($hDir);

foreach($aDirs as $sDir) {
        $sFile = STARTDIR.$sDir.'/access.log';
        echo "$sFile\n";

        $aSum = array();
        $hFile = fopen($sFile, 'r');
        while(false !== ($line = fgets($hFile))) {
                if(preg_match('/^\S+ \- \- \[(\S+ \+\d{4})\] "([^"]+)" (\d+) (\d+) /', $line, $matches)) {
                        $sTS = $matches[1];
                        $iSize = $matches[4];
                        if(preg_match('@\d\d/[a-zA-Z]{3}/\d{4}:(\d\d):@', $sTS, $matches2)) {
                                $iHour = $matches2[1];
                                if(!isset($aSum[$iHour])) $aSum[$iHour]=0;
                                $aSum[$iHour] += $iSize;
                        } // if
                } // if
        } // while
        fclose($hFile);

        foreach($aSum as $iHour => $iSum) {
                $iSum = number_format($iSum, 0, ',', '.');
                echo "\t$iHour: $iSum\n";
        } // foreach
} // foreach

The script just looks into the current access.log file so be sure to have a daily logrotate. The debian defaults for that timing are 06:00 am which should be corrected to a short time after midnight to get more data. Just edit your local /etc/crontab file to something like:

5 0     * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )