Table of Contents

ReferrerCop

ReferrerCop parses Apache log files and AWStats data files and removes entries for referring URLs that match a list of known referrer spammers. It then returns the filtered files via standard output. Run your logs through ReferrerCop either before or after they’re processed by AWStats (or before they’re processed by any other log analyzer) to eliminate annoying referrer spam from your web statistics.

Filtering is performed using a blacklist and an optional whitelist. ReferrerCop comes with a large, frequently updated blacklist.

If you’re a Ruby programmer, you can easily integrate ReferrerCop’s functionality into your own tools. See the API documentation for details.

Links

Requirements

  • Ruby 1.8.2 or higher

Downloads

Latest Version

For your convenience, ReferrerCop is also available in the FreeBSD ports collection as textproc/referrercop.

Previous Versions

SVN

The latest development version can always be found in the SVN repository.

Usage

referrercop [-f | -i | -n | -s] [options] [<file> ...]
referrercop -u <url> [options]
referrercop -U [options]
referrercop {-h | -V}

Modes:

 -f, --filter             Filter the specified files (or standard input if no
                          files are specified), sending the results to
                          standard output. This is the default mode.
 -i, --in-place           Filter the specified files in place, replacing each
                          file with the filtered version. A backup of the
                          original file will be created with a .bak extension.
 -n, --extract-ham        Extract ham (nonspam) URLs from the input data and
                          send them to standard output. Duplicates will be
                          suppressed.
 -s, --extract-spam       Extract spam URLs from the input data and send
                          them to standard output. Duplicates will be
                          suppressed.
 -u, --url <url>          Test the specified URL.
 -U, --update             Check for an updated version of the default
                          blacklist and download it if available.

Options:

 -b, --blacklist <file>   Blacklist to use instead of the default list.
 -c, --config <file>      Use the specified config file.
 -v, --verbose            Print verbose status and statistical info to stderr.
 -w, --whitelist <file>   Whitelist to use instead of the default list.

Information:

 -h, --help               Display usage information (this message).
 -V, --version            Display version information.

Examples

Filter an Apache log file using the default blacklist

# referrercop /var/log/httpd-access.log > filtered.log

Filter an Apache log file and display statistics

# referrercop -v /var/log/httpd-access.log > filtered.log
ReferrerCop v1.0.4 <http://wonko.com/software/referrercop/>
Copyright (c) 2005 Ryan Grove <ryan@wonko.com>.

Using blacklist ./blacklist.refcop
Compiled 37 blacklist patterns.
Input type: Apache combined log file
Processed 23605 lines in 1.63815021514893s (14410 lines per second)
23142 ham, 463 spam, 0 invalid

Filter an AWStats data file using a custom whitelist

# referrercop -w whitelist.txt /var/cache/awstats/awstats062005.txt > filtered.txt

Filter several Apache log files in place

# referrercop -i /var/log/foo.com-access.log /var/log/bar.com-access.log

Filter Apache log files in place by wildcard

# referrercop -i /var/log/*-access.log

Display the status of a single URL

# referrercop -u http://wonko.com/
Ham

Copyright

Copyright © 2005-2006 Ryan Grove.
ReferrerCop is open source software distributed under the terms of the GPL.