Loopback Mountain: July 2014

Recently I did a Packet Pushers episode about log management. In it, I mentioned some of the custom Python scripts that I run to do basic syslog analysis, and someone asked about them in the comments.

The script I'm presenting here isn't one of the actual ones that I run in production, but it's close. The real one sends emails, does DNS lookups, keeps a "rare messages" database using sqlite3, and a few other things, but I wanted to keep this simple.

One of the problems I see with getting started with log analysis is that people tend to approach it like a typical vendor RFP project: list some requirements, survey the market, evaluate and buy a product to fit your requirements. Sounds good, right? The problem with log analysis is that often you don't know what your requirements really are until you start looking at data.

A simple message counting script like this lets you look at your data, and provides a simple platform on which you can start to iterate to find your specific needs. It also lets us look at some cool Python features.

I don't recommend pushing this too far: once you have a decent idea of what your data looks like and what you want to do with it, set up Logstash, Graylog2, or a similar commercial product like Splunk (if you can afford it).

That said, here's the Python:

I tried to make this as self-documenting as possible. You run it from the CLI with a syslog file as the argument, and you get this:

$ python simple_syslog_count.py sample.txt
214   SEC-6-IPACCESSLOGP
15    SEC-6-IPACCESSLOGRL
10    LINEPROTO-5-UPDOWN
10    LINK-3-UPDOWN
7     USER-3-SYSTEM_MSG
4     STACKMGR-4-STACK_LINK_CHANGE
4     DUAL-5-NBRCHANGE
3     IPPHONE-6-UNREGISTER_NORMAL
3     CRYPTO-4-PKT_REPLAY_ERR
3     SEC-6-IPACCESSLOGRP
3     SEC-6-IPACCESSLOGSP
2     SSH-5-SSH2_USERAUTH
2     SSH-5-SSH2_SESSION
2     SSH-5-SSH2_CLOSE

10.1.16.12

     6     SEC-6-IPACCESSLOGP

10.1.24.3

     2     LINEPROTO-5-UPDOWN
     2     LINK-3-UPDOWN

[Stuff deleted for brevity]

For Pythonistas, the script makes use of a few cool language features:

Named, Compiled rRgexes

We can name a regex match with the (?PPATTERN) syntax, which makes it easy to understand it when it's referenced later with the .group('') method on the match object.
This is demonstrated in lines 36-39 and 58-59 of the gist shown above.
It would be more efficient to capture these fields by splitting the line with the .split() string method, but I wanted the script to work for unknown field positions -- hence the regex.

Multiplication of Strings

We control indentation by multiplying the ' ' string (that a single space enclosed in quotes) by an integer value in the print_counter function (line 50).

The reason this works is that the Python str class defines a special __mul__ method that controls how the * operator works for objects of that class:
>>> 'foo'.__mul__(3)
'foofoofoo'
>>> 'foo' * 3
'foofoofoo'

collections.Counter Objects

Counter objects are a subclass of dictionaries that know how to count things. Jeremy Schulman talked about these in a comment on the previous post. Here, we use Counters to build both the overall message counts and the per-device message counts:

>>> my_msg = 'timestamp ip_address stuff %MY-4-MESSAGE:other stuff'
>>> CISCO_MSG = re.compile('%(?P.*?):')
>>> from collections import Counter
>>> test_counter = Counter()
>>> this_msg = re.search(CISCO_MSG,my_msg).group('msg')
>>> this_msg
'MY-4-MESSAGE'
>>> test_counter[this_msg] += 1
>>> test_counter
Counter({'MY-4-MESSAGE': 1})

collections.defaultdict Dictionaries

It could get annoying when you're assigning dictionary values inside a loop, because you get errors when the key doesn't exist yet. This is a contrived example, but it illustrates the point:

>>> reporters = {}
>>> for reporter in ['1.1.1.1','2.2.2.2']:
... reporters[reporter].append['foo']
...
Traceback (most recent call last):
File "", line 2, in
KeyError: '1.1.1.1'
To fix this, you can catch the exception:

>>> reporters = {}
>>> for reporter in ['1.1.1.1','2.2.2.2']:
...     try:
...         reporters[reporter].append['foo']
...         reporters[reporter].append['bar']
...     except KeyError:
...         reporters[reporter] = ['foo']
...         reporters[reporter].append('bar')

As usual, though, Python has a more elegant way in the collections module: defaultdict

>>> from collections import defaultdict
>>> reporters = defaultdict(list)
>>> for reporter in ['1.1.1.1','2.2.2.2']:
... reporters[reporter].append('foo')
... reporters[reporter].append('bar')
>>> reporters
defaultdict(, {'1.1.1.1': ['foo', 'bar'], '2.2.2.2': ['foo', 'bar']})

In the syslog counter script, we use a collections.Counter object as the type for our defaultdict. This allows us to build a per-syslog-reporter dictionary that shows how many times each message appears for each reporter, while only looping through the input once (line 66):

per_reporter_counts[reporter][msg] += 1

Here, the dictionary per_reporter_counts has the IPv4 addresses of the syslog reporters as keys, with a Counter object as the value holding the counts for each message type:

>>> from collections import Counter,defaultdict
>>> per_reporter_counts = defaultdict(Counter)
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 1
>>> per_reporter_counts
defaultdict(, {'1.1.1.1': Counter({'SOME-5-MESSAGE': 1})})
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 5
>>> per_reporter_counts
defaultdict(, {'1.1.1.1': Counter({'SOME-5-MESSAGE': 6})})

If you got this far, you can go implement it for IPv6 addresses. :-)

Loopback Mountain

Tuesday, July 1, 2014

Simple Python Syslog Counter

Named, Compiled rRgexes

Multiplication of Strings

collections.Counter Objects

collections.defaultdict Dictionaries