ReferrerCop
ReferrerCop parses Apache log files and AWStats data files and removes entries for referring URLs that match a list of known referrer spammers. It then returns the filtered files via standard output. Run your logs through ReferrerCop either before or after they’re processed by AWStats (or before they’re processed by any other log analyzer) to eliminate annoying referrer spam from your web statistics.
Filtering is performed using a blacklist and an optional whitelist. ReferrerCop comes with a large, frequently updated blacklist.
If you’re a Ruby programmer, you can easily integrate ReferrerCop’s functionality into your own tools. See the API documentation for details.
Links
Requirements
- Ruby 1.8.2 or higher
Downloads
Latest Version
- referrercop-1.1.0.tar.gz (02/26/2006)
For your convenience, ReferrerCop is also available in the FreeBSD ports collection as textproc/referrercop.
Previous Versions
- referrercop-1.0.4.tar.gz (10/17/2005)
- referrercop-1.0.3.tar.gz (10/06/2005)
- referrercop-1.0.2.tar.gz (06/17/2005)
- referrercop-1.0.1.tar.gz (06/09/2005)
- referrercop-1.0.0.tar.gz (06/04/2005)
SVN
The latest development version can always be found in the SVN repository.
Usage
referrercop [-f | -i | -n | -s] [options] [<file> ...]
referrercop -u <url> [options]
referrercop -U [options]
referrercop {-h | -V}
Modes:
-f, --filter Filter the specified files (or standard input if no
files are specified), sending the results to
standard output. This is the default mode.
-i, --in-place Filter the specified files in place, replacing each
file with the filtered version. A backup of the
original file will be created with a .bak extension.
-n, --extract-ham Extract ham (nonspam) URLs from the input data and
send them to standard output. Duplicates will be
suppressed.
-s, --extract-spam Extract spam URLs from the input data and send
them to standard output. Duplicates will be
suppressed.
-u, --url <url> Test the specified URL.
-U, --update Check for an updated version of the default
blacklist and download it if available.
Options:
-b, --blacklist <file> Blacklist to use instead of the default list.
-c, --config <file> Use the specified config file.
-v, --verbose Print verbose status and statistical info to stderr.
-w, --whitelist <file> Whitelist to use instead of the default list.
Information:
-h, --help Display usage information (this message).
-V, --version Display version information.
Examples
Filter an Apache log file using the default blacklist
# referrercop /var/log/httpd-access.log > filtered.log
Filter an Apache log file and display statistics
# referrercop -v /var/log/httpd-access.log > filtered.log ReferrerCop v1.0.4 <http://wonko.com/software/referrercop/> Copyright (c) 2005 Ryan Grove <ryan@wonko.com>. Using blacklist ./blacklist.refcop Compiled 37 blacklist patterns. Input type: Apache combined log file Processed 23605 lines in 1.63815021514893s (14410 lines per second) 23142 ham, 463 spam, 0 invalid
Filter an AWStats data file using a custom whitelist
# referrercop -w whitelist.txt /var/cache/awstats/awstats062005.txt > filtered.txt
Filter several Apache log files in place
# referrercop -i /var/log/foo.com-access.log /var/log/bar.com-access.log
Filter Apache log files in place by wildcard
# referrercop -i /var/log/*-access.log
Display the status of a single URL
# referrercop -u http://wonko.com/ Ham
Copyright
Copyright © 2005-2006 Ryan Grove.
ReferrerCop is open source software distributed under the terms of the GPL.