Skip to content

Instantly share code, notes, and snippets.

Created March 9, 2020 20:05
Show Gist options
  • Save thatandromeda/10205b613170fde7fcc8f62ea0956ecc to your computer and use it in GitHub Desktop.
Save thatandromeda/10205b613170fde7fcc8f62ea0956ecc to your computer and use it in GitHub Desktop.
Scripts referenced during my Code4lib 2020 conference talk.

awk scripts

to be added later.

rack-attack throttling policies

Two commits, because I had to experiment to find what worked.


These scripts have banned over 120,000 IPs across their lifetime, though most are not presently banned.

I've removed a couple of values specific to our configuration and indicated where you should replace them with values that make sense in your context.

nginx-lumen-abuse catches IPs that violate a throttling policy; recidive-lumen catches repeat offenders. lumen is the name of our app so you probably want to replace that too :).


enabled  = true
port     = http,https
filter   = nginx-lumen-abuse
logpath  = $YOUR_LOG_PATH_HERE
# A host is banned if it has generated "maxretry" during the last "findtime" seconds.


enabled  = true
filter   = recidive-lumen
logpath  = $YOUR_LOG_PATH_HERE
action   = iptables-multiport[name=recidive-lumen,port="http,https"]


# Fail2Ban filter to match web requests for selected URLs

failregex = ^(www\.)? <HOST> \- \S+ \[ \S+\] \"GET $YOUR_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$
            ^(www\.)? <HOST> \- \S+ \[ \S+\] \"GET $YOUR_OTHER_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$

ignoreregex =

# DEV Notes:
# Based on apache-botsearch filter
# Author: Frantisek Sumsal


# Fail2Ban filter for repeat bans
# This filter monitors the fail2ban log file, and enables you to add long
# time bans for ip addresses that get banned by fail2ban multiple times.
# Reasons to use this: block very persistent attackers for a longer time,
# stop receiving email notifications about the same attacker over and
# over again.
# This jail is only useful if you set the 'findtime' and 'bantime' parameters
# in jail.conf to a higher value than the other jails. Also, this jail has its
# drawbacks, namely in that it works only with iptables, or if you use a
# different blocking mechanism for this jail versus others (e.g. hostsdeny
# for most jails, and shorewall for this one).


# Read common prefixes. If any customizations available -- read them from
# common.local
before = common.conf


_daemon = fail2ban\.actions

# The name of the jail that this filter is used for. In jail.conf, name the
# jail using this filter 'recidive', or change this line!
_jailname = recidive-lumen

# example log line
#2019-04-03 13:49:51,332 fail2ban.actions: WARNING [nginx-lumen-abuse] Ban $EXAMPLE_DODGY_IP

failregex = ^(%(__prefix_line)s|,\d{3} fail2ban.actions:\s+)WARNING\s+\[(?!%(_jailname)s\])(?:.*)\]\s+Ban\s+<HOST>\s*$

ignoreregex =

# Author: Tom Hendrikx, modifications by Amir Caspi
Copy link

Additionally, note that fail2ban needs to be configured to allow large enough logs for the scripts' needs.

Copy link

Here are the awk scripts I referenced. the numbers ($2 and similar) might have to be changed depending on the format of this log; they should reference the column containing the data of interest (so e.g. the second position in my log lines is where IP addresses go).

I used these as ways to "read" my logs quickly, looking for data to inform future judgments (e.g. are there IP ranges I should ban outright; which paths on our site need to be throttled).

Find the IPs that hit you most often:

awk '{ print $2}' $PATH_TO_YOUR_LOGFILE | sort | uniq -c | sort -nr | head -n 20

Count hits per IP over a time range:

grep "01/Apr/2019:12:10" $PATH_TO_YOUR_LOGFILE |awk '{print $2}' |sort |uniq -c |sort -n

Modify the grep for your date/time. Note that you can look at a single second, or a ten-minute range, or an hour, et cetera, by changing how much of the time you write out.

Find 500 errors:

less $PATH_TO_YOUR_LOGFILE | grep " 500" | awk '{print $2}'

This prints IP addresses which get 500 errors. The quotes and the space in the grep are important -- otherwise you'll find IP addresses and site URLs which contain the substring "500".

Find the pages on your site that are successfully fetched (i.e. http 200) most often:

less $PATH_TO_YOUR_LOGFILE | grep " 200" | awk '{print $8}' |sort |uniq -c |sort -n

(...because if I were an attacker I'd find the slow pages and hit them repeatedly.)

Find most common 429 errors (by IP and path)

grep $PATH_TO_YOUR_LOGFILE | awk '{print $2, $8}' | sort |uniq -c | sort -n

Useful if you're returning 429s as part of a throttling policy -- find the IPs that get throttled a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment