Skip to content

Instantly share code, notes, and snippets.

@thatandromeda
Created March 9, 2020 20:05
Show Gist options
  • Save thatandromeda/10205b613170fde7fcc8f62ea0956ecc to your computer and use it in GitHub Desktop.
Save thatandromeda/10205b613170fde7fcc8f62ea0956ecc to your computer and use it in GitHub Desktop.
Scripts referenced during my Code4lib 2020 conference talk.

awk scripts

to be added later.

rack-attack throttling policies

Two commits, because I had to experiment to find what worked. https://github.com/berkmancenter/lumendatabase/commit/0e414fcba54851a615dee73f5c36624063e6d4f7 https://github.com/berkmancenter/lumendatabase/commit/2877ea1c05850715405340a7c8680056653a4876

fail2ban

These scripts have banned over 120,000 IPs across their lifetime, though most are not presently banned.

I've removed a couple of values specific to our configuration and indicated where you should replace them with values that make sense in your context.

nginx-lumen-abuse catches IPs that violate a throttling policy; recidive-lumen catches repeat offenders. lumen is the name of our app so you probably want to replace that too :).

nginx-lumen-abuse

enabled  = true
port     = http,https
filter   = nginx-lumen-abuse
logpath  = $YOUR_LOG_PATH_HERE
bantime  = $BAN_LENGTH_IN_SECONDS
# A host is banned if it has generated "maxretry" during the last "findtime" seconds.
findtime = $INTERVAL_IN_SECONDS
maxretry = $NUMBER_THAT_MAKES_SENSE_IN_YOUR_CONTEXT

recidive-lumen

enabled  = true
filter   = recidive-lumen
logpath  = $YOUR_LOG_PATH_HERE
action   = iptables-multiport[name=recidive-lumen,port="http,https"]
bantime  = $BAN_LENGTH_IN_SECONDS
findtime = $INTERVAL_IN_SECONDS
maxretry = $NUMBER_THAT_MAKES_SENSE_IN_YOUR_CONTEXT

/etc/fail2ban/filter.d/nginx-lumen-abuse.conf

# Fail2Ban filter to match web requests for selected URLs
#
[Definition]

failregex = ^(www\.)?lumendatabase.org <HOST> \- \S+ \[ \S+\] \"GET $YOUR_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$
            ^(www\.)?lumendatabase.org <HOST> \- \S+ \[ \S+\] \"GET $YOUR_OTHER_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$

ignoreregex =

# DEV Notes:
# Based on apache-botsearch filter
#
# Author: Frantisek Sumsal

/etc/fail2ban/filter.d/recidive-lumen.conf

# Fail2Ban filter for repeat bans
#
# This filter monitors the fail2ban log file, and enables you to add long
# time bans for ip addresses that get banned by fail2ban multiple times.
#
# Reasons to use this: block very persistent attackers for a longer time,
# stop receiving email notifications about the same attacker over and
# over again.
#
# This jail is only useful if you set the 'findtime' and 'bantime' parameters
# in jail.conf to a higher value than the other jails. Also, this jail has its
# drawbacks, namely in that it works only with iptables, or if you use a
# different blocking mechanism for this jail versus others (e.g. hostsdeny
# for most jails, and shorewall for this one).

[INCLUDES]

# Read common prefixes. If any customizations available -- read them from
# common.local
before = common.conf

[Definition]

_daemon = fail2ban\.actions

# The name of the jail that this filter is used for. In jail.conf, name the
# jail using this filter 'recidive', or change this line!
_jailname = recidive-lumen

# example log line
#2019-04-03 13:49:51,332 fail2ban.actions: WARNING [nginx-lumen-abuse] Ban $EXAMPLE_DODGY_IP

failregex = ^(%(__prefix_line)s|,\d{3} fail2ban.actions:\s+)WARNING\s+\[(?!%(_jailname)s\])(?:.*)\]\s+Ban\s+<HOST>\s*$

ignoreregex =

# Author: Tom Hendrikx, modifications by Amir Caspi
@thatandromeda
Copy link
Author

Additionally, note that fail2ban needs to be configured to allow large enough logs for the scripts' needs.

@thatandromeda
Copy link
Author

Here are the awk scripts I referenced. the numbers ($2 and similar) might have to be changed depending on the format of this log; they should reference the column containing the data of interest (so e.g. the second position in my log lines is where IP addresses go).

I used these as ways to "read" my logs quickly, looking for data to inform future judgments (e.g. are there IP ranges I should ban outright; which paths on our site need to be throttled).

Find the IPs that hit you most often:

awk '{ print $2}' $PATH_TO_YOUR_LOGFILE | sort | uniq -c | sort -nr | head -n 20

Count hits per IP over a time range:

grep "01/Apr/2019:12:10" $PATH_TO_YOUR_LOGFILE |awk '{print $2}' |sort |uniq -c |sort -n

Modify the grep for your date/time. Note that you can look at a single second, or a ten-minute range, or an hour, et cetera, by changing how much of the time you write out.

Find 500 errors:

less $PATH_TO_YOUR_LOGFILE | grep " 500" | awk '{print $2}'

This prints IP addresses which get 500 errors. The quotes and the space in the grep are important -- otherwise you'll find IP addresses and site URLs which contain the substring "500".

Find the pages on your site that are successfully fetched (i.e. http 200) most often:

less $PATH_TO_YOUR_LOGFILE | grep " 200" | awk '{print $8}' |sort |uniq -c |sort -n

(...because if I were an attacker I'd find the slow pages and hit them repeatedly.)

Find most common 429 errors (by IP and path)

grep $PATH_TO_YOUR_LOGFILE | awk '{print $2, $8}' | sort |uniq -c | sort -n

Useful if you're returning 429s as part of a throttling policy -- find the IPs that get throttled a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment