Making a Non-Spamming Alert using Diff

Being on pager duty is a downer. And it’s a big mistake that ruins the on-call or first-responder position when you allocate no-no time to actually invest in a useful reporting platform. Like many before me, I used to create exactly that evil: write a bash script, and throw it in cron.

Example of the Wrong Method

MAILTO=pager.list@example.com
*/5 * * * * curl http://www.example.com/ || echo "website down"

There’s a pretty good technique to reduce the occurrence of duplicate pages, and that’s to use diff. Let’s work through an example of checking a web page for errors.

PAGE_1="http://localhost/example.php"
PREV_DIR=/var/spool/pager_prev
RECENT_DIR=/var/spool/pager_recent
REF_DIR=/var/spool/pager_refs
REF_PAGE="$REF_DIR/example.txt"
THRESHOLD_DELTA=3    # adjust this to 
DO_ALERT=0
ALERT_FILE=/tmp/example.mail
MAILTO=server.alerts@example.com

That sets up a series of directories: things presently getting tested get downloaded into the recent directory. Results from the previous test get moved to the previous directory.

[ -f $RECENT_DIR/example.txt ]   && mv $RECENT_DIR/example.txt $PREV_DIR
[ ! -f $RECENT_DIR/example.err ] && rm -f $PREV_DIR/example.err
[ -f $RECENT_DIR/example.err ]   && mv $RECENT_DIR/example.err $PREV_DIR

That did the basic move for every previous page request and error determination.

curl -o $RECENT_DIR/example.txt "$PAGE_1" &> $RECENT_DIR/example.err
[ $? -ne 0 -a ! -f $PREV_DIR/example.err] && DO_ALERT=1

That will tell us if this was the first time we haven’t been able to finish a page request.

diff $PREV_DIR/example.txt $RECENT_DIR/example.txt > $RECENT_DIR/example.diff
if [ $? -ne 0 ] ; then
   $lines = $( wc -l < /tmp/example.diff )
   [ $lines -gt $THRESHOLD_DELTA ] && DO_ALERT=1
fi

That recorded the change in the page if there was one. The end of the page might still say “java.IOException” but we don’t need to get paged unless that error actually changes.

the_diff=`head $RECENT_DIR/example.diff`
the_error=''
[ -f $RECENT_DIR/example.err -a `wc -l < $RECENT_DIR/example.err` -gt 0 ] && the_error=`cat $RECENT_DIR/example.err`
if [ $DO_ALERT -ne 0 ] ; then
      mail -s "[monitor] example.diff changed" $MAILTO <<

And that emails us the results of a change in page structure.

Also check out...

Jed Reynolds
Jed Reynolds has been known to void warranties and super glue his fingers together. When he is not doing photography or fixing his bike, he can be found being a grey beard programmer analyst for Candela Technologies. Start stalking him at https://about.me/jed_reynolds.
  • Jerker Montelius

    Or you cold install nagios or Zabbix an get it all.