Friday, December 5, 2014

Imposing Artificial Limitations to Develop Skills

I'm a big fan of imposing artificial limitations on yourself in order to aid skill development. Here are some quick ideas:

  • When troubleshooting network devices from the CLI, try not to look at the configuration. Use only "show" or "debug" commands instead. I found this enormously beneficial when practicing for CCIE.
  • When troubleshooting larger operational issues or learning a new environment, try not to log into individual devices at all. Force yourself to use only your network management system, NetFlow, packet captures, or host-based tools like ping, traceroute, or nmap.
  • When learning automation or orchestration skills, force yourself to write scripts, run API calls, or use your favorite orchestration tool to do simple things, even if it doesn't seem like they merit the extra effort.

Tuesday, July 1, 2014

Simple Python Syslog Counter

Recently I did a Packet Pushers episode about log management. In it, I mentioned some of the custom Python scripts that I run to do basic syslog analysis, and someone asked about them in the comments.

The script I'm presenting here isn't one of the actual ones that I run in production, but it's close. The real one sends emails, does DNS lookups, keeps a "rare messages" database using sqlite3, and a few other things, but I wanted to keep this simple.

One of the problems I see with getting started with log analysis is that people tend to approach it like a typical vendor RFP project: list some requirements, survey the market, evaluate and buy a product to fit your requirements. Sounds good, right? The problem with log analysis is that often you don't know what your requirements really are until you start looking at data.

A simple message counting script like this lets you look at your data, and provides a simple platform on which you can start to iterate to find your specific needs. It also lets us look at some cool Python features.

I don't recommend pushing this too far: once you have a decent idea of what your data looks like and what you want to do with it, set up Logstash, Graylog2, or a similar commercial product like Splunk (if you can afford it).

That said, here's the Python:

I tried to make this as self-documenting as possible. You run it from the CLI with a syslog file as the argument, and you get this:

$ python sample.txt
 10    LINK-3-UPDOWN
 2     SSH-5-SSH2_CLOSE


     2     LINK-3-UPDOWN

[Stuff deleted for brevity]

For Pythonistas, the script makes use of a few cool language features:

Named, Compiled rRgexes

  • We can name a regex match with the (?PPATTERN) syntax, which makes it easy to understand it when it's referenced later with the .group('') method on the match object.
  • This is demonstrated in lines 36-39 and 58-59 of the gist shown above. 
  • It would be more efficient to capture these fields by splitting the line with the .split() string method, but I wanted the script to work for unknown field positions -- hence the regex. 

Multiplication of Strings

  • We control indentation by multiplying the ' ' string (that a single space enclosed in quotes) by an integer value in the print_counter function (line 50).
    • The reason this works is that the Python str class defines a special __mul__ method that controls how the * operator works for objects of that class:
      >>> 'foo'.__mul__(3)
      >>> 'foo' * 3

collections.Counter Objects

  • Counter objects are a subclass of dictionaries that know how to count things. Jeremy Schulman talked about these in a comment on the previous post. Here, we use Counters to build both the overall message counts and the per-device message counts:
>>> my_msg = 'timestamp ip_address stuff %MY-4-MESSAGE:other stuff'
>>> CISCO_MSG = re.compile('%(?P.*?):')
>>> from collections import Counter
>>> test_counter = Counter()
>>> this_msg =,my_msg).group('msg')
>>> this_msg
>>> test_counter[this_msg] += 1
>>> test_counter
Counter({'MY-4-MESSAGE': 1})

collections.defaultdict Dictionaries

  • It could get annoying when you're assigning dictionary values inside a loop, because you get errors when the key doesn't exist yet. This is a contrived example, but it illustrates the point:

    >>> reporters = {}
    >>> for reporter in ['','']:
    ...     reporters[reporter].append['foo']
    Traceback (most recent call last):
      File "", line 2, in
    KeyError: ''

  • To fix this, you can catch the exception:

    >>> reporters = {}
    >>> for reporter in ['','']:
    ...     try:
    ...         reporters[reporter].append['foo']
    ...         reporters[reporter].append['bar']
    ...     except KeyError:
    ...         reporters[reporter] = ['foo']
    ...         reporters[reporter].append('bar')
  • As usual, though, Python has a more elegant way in the collections module: defaultdict
>>> from collections import defaultdict
>>> reporters = defaultdict(list)
>>> for reporter in ['','']:
...     reporters[reporter].append('foo')
...     reporters[reporter].append('bar')
>>> reporters
defaultdict(, {'': ['foo', 'bar'], '': ['foo', 'bar']})
In the syslog counter script, we use a collections.Counter object as the type for our defaultdict. This allows us to build a per-syslog-reporter dictionary that shows how many times each message appears for each reporter, while only looping through the input once (line 66):

 per_reporter_counts[reporter][msg] += 1

Here, the dictionary per_reporter_counts has the IPv4 addresses of the syslog reporters as keys, with a Counter object as the value holding the counts for each message type:

>>> from collections import Counter,defaultdict
>>> per_reporter_counts = defaultdict(Counter)
>>> per_reporter_counts['']['SOME-5-MESSAGE'] += 1
>>> per_reporter_counts
defaultdict(, {'': Counter({'SOME-5-MESSAGE': 1})})
>>> per_reporter_counts['']['SOME-5-MESSAGE'] += 5
>>> per_reporter_counts
defaultdict(, {'': Counter({'SOME-5-MESSAGE': 6})})

If you got this far, you can go implement it for IPv6 addresses. :-)

Friday, June 20, 2014

Python Sets: Handy for Network Data

My Python-related posts seem to get the most reads, so here's another one!

A problem that comes up fairly often in networking is finding the number of occurrences of unique items in a large collection of data: let's say you want to find all of the unique IP addresses that accessed a website, traversed a firewall, got denied by an ACL, or whatever. Maybe you've extracted the following list from a log file:

and you need to reduce this to:

In other words, we're removing the duplicates. In low-level programming languages, removing duplicates is a bit of a pain: generally you need to implement an efficient way to sort an array of items, then traverse the sorted array to check for adjacent duplicates and remove them. In a language that has dictionaries (also known as hash tables or associative arrays), you can do it by adding each item as a key in your dictionary with an empty value, then extract the keys. In Python:

>>> items = ['','','','','','','','']
>>> d = {}
>>> for item in items:
...     d[item] = None
>>> d
{'': None, '': None, '': None, '': None}
>>> unique = d.keys()
>>> unique
['', '', '', '']

or, more concisely using a dictionary comprehension:

>>> {item:None for item in items}.keys()
['', '', '', '']

Python has an even better way, however: the "set" type, which emulates the mathematical idea of a set as a collection of distinct items. If you create an empty set and add items to it, duplicates will automatically be thrown away:

>>> s = set()
>>> s.add('')
>>> s
>>> s.add('')
>>> s.add('')
>>> s
set(['', ''])
>>> for item in items:
...     s.add(item)
>>> s
set(['', '', '', ''])

Predictably, you can use set comprehensions just like list comprehensions to do the same thing as a one liner:

>>> {item for item in items}
set(['', '', '', ''])

Or, if you have a list built already you can just convert it to a set:

>>> set(items)
set(['', '', '', ''])

Python also provides methods for the most common types of set operations: union, intersection, difference and symmetric difference. Because these methods accept lists or other iterables, you can quickly find similarities between collections of items:

>>> items
['', '', '', '', '', '', '', '']
>>> more_items = ['','','','','']
>>> set(items).intersection(more_items)
set(['', ''])

>>> set(items).difference(more_items)
set(['', ''])

Have fun!

Wednesday, April 2, 2014

Fun with Router IP Traffic Export and NSM

The Basics
I finally got around to setting up Security Onion (the best network security monitoring package available) to monitor my home network, only to discover that my Cisco 891 router doesn't support support the right form of SPAN. Here's how I worked around it. The topology looks like this:

The 891 router has an integrated 8-port switch module, so the simple case would have been a traditional SPAN setup; something like this:

! vlan 10 is the user VLAN
monitor session 1 source interface vlan 10
monitor session 1 destination interface FastEthernet0

with the server's monitoring NIC connected to FastEthernet0.

The problem is that the 891 doesn't support using a VLAN as a source interface, and because of the way the embedded WAP works, a physical source interface won't work either. Hence, I turned to an obscure feature that's helped me occasionally in the past: Router IP Traffic Export. This is a feature for IOS software platforms that enables you to enable SPAN-like functions for almost any source interface.

The configuration looks like this:

ip traffic-export profile RITE_MIRROR
  interface FastEthernet0
  mac-address 6805.ca21.2ddd

interface Vlan10
 ip traffic-export apply RITE_MIRROR

This takes all traffic routed across the Vlan10 SVI and sends it out the FastEthernet0 interface, rewriting the destination MAC address to the specified value. I used the MAC address of my monitoring NIC, but it shouldn't matter in this case because the monitoring NIC is directly attached. If I wanted to copy the traffic across a switched interface, it would matter.

My ESXi host (a low-cost machine from Zareason with 32GB of RAM) has two physical NICs; one for all of the regular VM traffic (using 802.1q to separate VLANs if needed) and one for monitoring. The monitoring pNIC is attached to a promiscuous mode vSwitch in ESXi, which in turn is connected to the monitoring vNIC on the Security Onion VM. The effect of this is identical to SPAN-ing all the traffic from VLAN 10 to my Security Onion monitoring system; I get Snort, Bro, Argus, and full packet capture with just the built-in software tools in IOS and ESXi.

Oddity: RITE Capture, Tunnels?
Interestingly, you can also use RITE to capture traffic to a RAM buffer and export it to a pcap file. I don't understand why you would use this instead of the much more flexible Embedded Packet Capture Feature, though.

Another thing I've wondered is whether you could use a L2 tunnel to send the mirrored traffic elsewhere in the network. The destination interface must be a physical Ethernet interface, but it would be interesting to try using a L2TPv3 tunnel from an Ethernet interface to another router--I have no idea if this would work.

Production Use?
Cisco makes the UCS E-series blades for ISR G2 routers that let you run a hypervisor on a blade inside your router chassis. These things have an external Ethernet port on them, so you should be able to connect a RITE export interface to the external port on an E-series blade, and run Security Onion inside your router. I've always wanted to try this, but I haven't been able to get funding yet to test it.

Friday, March 21, 2014

Quick Thoughts on the Micro Data Center

Here's something that's been on my radar lately: while all the talk in the networking world seems to be about the so-called "massively scalable" data center, almost all of the people I talk to in my world are dealing with the fact that data centers are rapidly getting smaller due to virtualization efficiencies. This seems to be the rule rather than the exception for small-to-medium sized enterprises.

In the micro data center that sits down the hall from me, for example, we've gone from 26 physical servers to 18 in the last few months, and we're scheduled to lose several more as older hypervisor hosts get replaced with newer, denser models. I suspect we'll eventually stabilize at around a dozen physical servers hosting in the low hundreds of VMs. We could get much denser, but things like political boundaries inevitably step in to keep the count higher than it might be otherwise. The case is similar in our other main facility.

From a networking perspective, this is interesting: I've heard vendor and VAR account managers remark lately that virtualization is cutting into their hardware sales. I'm most familiar with Cisco's offerings, and at least right now they don't seem to be looking at the micro-DC market as a niche: high-port count switches are basically all that are available. Cisco's design guide for the small-to-medium data center starts in the 100-300 10GE port range, which with modern virtualization hosts will support quite a few thousand typical enterprise VMs.

Having purchased the bulk of our older-generation servers before 10GE was cheap, we're just getting started with 10GE to the server access layer. Realistically, within a year or so a pair of redundant, reasonably feature-rich 24-32 port 10GE switches will be all we need for server access, probably in 10GBASE-T. Today, my best Cisco option seems to be the Nexus 9300 series, but it still has a lot of ports I'll never use.

One thought I've had is to standardize on the Catalyst 4500-X for all DC, campus core, and campus distribution use. With VSS, the topologies are simple. The space, power, and cooling requirements are minimal, and the redundancy is there. It has good layer 3 capabilities, along with excellent SPAN and NetFlow support. The only thing it seems to be lacking today is an upgrade path to 40GE, but that may be acceptable in low-port-density environments. Having one platform to manage would be nice. The drawbacks, of course, are a higher per-port cost and lack of scalability -- but again, scalability isn't really the problem.

Comments welcome.

Wednesday, March 5, 2014

Packet Capture in Diverse / Tunneled Networks?

(With the usual caveats that I am just a hick from Colorado, I don't know what I'm talking about, etc.)

I just read Pete Welcher's superb series on NSX, DFA, ACI, and other SDN stuff on the Chesapeake Netcraftsmen blog, and it helped me think more clearly about a problem that's been bothering me for a long time: how do we do realistically scalable packet capture in networks that make extensive use of ECMP and/or tunnels? Here's a sample network that Pete used:

Conventionally, we place packet capture devices at choke points in the network. But in medium-to-large data center designs, one of the main goals is to eliminate choke points: if we assume this is a relatively small standard ECMP leaf-spine design, each of the leaf switches has four equal-cost routed paths through the spine switches, and each spine switch has at least as many downlinks as there are leaf switches. The hypervisors each have two physical paths to the leaf switches, and in a high-density virtualization design we probably don't have a very good idea of what VM resides on what hypervisor at any point in time.

Now, add to that the tunneling features present in hypervisor-centric network virtualization schemes: traffic between two VMs attached to different hypervisors is tunneled inside VXLAN, GRE, or STT packets, depending on how you have things set up. The source and destination IP addresses of the "outer" hypervisor-to-hypervisor packet are not the sample as the "inner" VM-to-VM addresses, and presumably it's the latter that interest us. Thus, it's hard to even figure out which packets to capture. If we capture all of them (hard at any kind of scale; good 10G line-rate capture is still expensive and troublesome), we still have to filter for the inner tunnel headers to figure out what we're looking at.

What can we do? I see a few options:

  1. Put a huge mirror/tap switch between the leaf and spine. Gigamon makes some big ones, with up to 64 x 40GE ports or 256 x 10GE ports. When you max those out, start putting them at the end of each row. They advertise the ability to pop all kinds of different tunnel headers in hardware, along with lots of cool filtering and load-balancing capabilities.
  2. Buy or roll-your-own rack-mount packet capture appliances on commodity hardware, and run ad hoc SPAN sessions to them from the leaf switches.
  3. Install hypervisor-based packet capture VMs on all your hypervisors and capture from promiscuous mode vSwitches. There are lots of commercial solutions here, or you could roll your own. Update: Pete Welcher responded on Twitter and mentioned the option of doing packet capture pre-tunnel-encap or post-tunnel-decap. That's originally what I was thinking of with this option, but after reviewing a couple of his posts again, it appears there may be scenarios where the hypervisor makes tunnels to itself, so a better way of doing it might be to implement a packet capture API in the hypervisor itself that can control the point in the tunnel chain where the capture takes place. The next question is: where do we retrieve the capture? Does the API send the capture to a VM, save it on a datastore, dump it to a physical port, send it via another tunnel akin to ERSPAN? I'd want multiple options.
  4. Make sure Wireshark, tshark, or tcpdump is installed on every VM.
  5. Give up on intra-DC capture and focus only at the ingress/egress points.
Option 1 is the only one that really confronts both the network diversity and tunnel encapsulation head-on. Those boxes and their administrative overhead don't come cheap, but today this is probably the most fiddle-free option. The other options require a lot of customization and manual intervention that may or may not interfere with change-control procedures, and don't provide obvious solutions for de-obfuscating the tunneled traffic. Option 2 also suffers from serious scalability problems in ECMP designs. Option 5 just avoids the problem, but might work for some people.

However, the whole point of these designs is "SDN". What I *hope* is going to happen as SDN controllers start to become available is that the controller will be sufficiently aware of VM location that it can instruct the appropriate vSwitch OR leaf-switch OR spine switch to copy packets that meet a certain set of criteria to a particular destination port. Call it super-SPAN (can you tell I'm not headed for a new career in product naming?). It would be nice to be able to define the copied packets in different ways:
  • Conventional L3/L4 5-tuple. This would be nice because it could be informed by NetFlow/IPFIX data, without the need for DPI on the flow-exporter.
  • VM DNS name, port profile, or parent hypervisor.
  • QoS class.
  • "Application profile" -- it remains to be seen exactly what this means, but this is one of those SDN holy-grail things that allows more granular definition of traffic types.
Finally, it would be nice if the controller was smart enough to be able to load-balance the copied packets when necessary, so that the same capture target sees both sides of a given flow.

Again, I know nothing about plans for this stuff from any vendor. But I hope the powers-that-be in the SDN/etc world are thinking about at least some of these kinds of capabilities.

And... about the time they get that all figured out, we'll have to be dealing with a bunch of that traffic being encrypted between VMs or hypervisors...

Friday, February 14, 2014

Programmatically Configuring Interface Descriptions from Phone Descriptions

I wrote some Python code that allows you to do the following:
  1. Query a Catalyst switch CDP neighbor table from its HTTPS interface,
  2. Extract the device names of the attached IP phones,
  3. Query Communications Manager for the IP phone device description, 
  4. Apply the device description as the switch interface description.
Obviously, this makes it much easier to see whose phone is attached to a switch port.

I hope that this example saves someone the head-banging that I incurred while trying to figure out the AXL XML/SOAP API for Communications Manager.

I haven't tested this extensively; all my testing has been on Catalyst 3560 and 3750 switches and CUCM version 8.6. Using the --auto switch to automatically configure the switch is quite slow; this is a limitation of the HTTPS interface rather than the script code. It may be faster to leave that option off and manually copy/paste the printed configuration if you're in a hurry.

Note that your switch must be configured to allow configuration via the HTTPS interface; you may need to modify your TACACS/etc. configurations accordingly.

All the relevant info is in the Github repo.

Monday, January 20, 2014

Why Network Engineers Should Learn Programming

Because Microsoft Excel is not a text editor. Seriously.

This is a followup to the previous post, inspired by Ethan Banks of Packet Pushers. If you do operational networking at all, you deal with text files all the time: logs, debug output, configuration files, command line diagnostics, and more. I'm constantly amazed when I see people open Word or Excel to do their text editing, often one keystroke at a time.

The number one reason to learn basic programming is to automate that stuff. Personally, I use a combination of traditional Unix shell tools and Python to get the job done, but you could probably do it all with one or the other.

There are lots of other reasons to learn programming too, many of which will be discussed on an upcoming Packet Pushers episode. But if you don't believe any of those, this one alone makes it worth the effort.

Step away from the spreadsheet. Do it now.

Friday, January 17, 2014

Quick Thoughts on Learning Python

I was scheduled to be a guest on an upcoming episode of the Packet Pushers podcast, on the topic of Python for network engineers. Unfortunately due to bad luck I'm not going to be able to make the recording. Here are some quick thoughts on learning Python. If you're already an expert programmer you already know how to learn languages, so this post isn't for you.

Scenario 1: You've coded in another language, but you're not an expert.
I would start with the basic Python class at Google Code. It's targeted specifically at people who know basic programming skills in some other language. It was perfect for me; I went through the exercises and was able to quickly start writing simple, useful Python scripts.

Scenario 2: You don't know how to write code at all.
Start with the Udacity CS101 class if you like guided learning, or Learn Python the Hard Way if you prefer books. Be prepared to spend a lot of time on either. It's not easy the first time around.

After you've gotten through one of those two scenarios, do the following:

  1. Spend time browsing the documentation for the Python Standard Library. Python is a large language, and chances are there's something in the standard library that will help you meet your goals. If you find yourself writing a lot of lines of code to accomplish something fairly simple, look harder. I recommend skimming the documentation for every module, then looking more carefully at the ones that interest you.
  2. Matt Harrison's books on basic and intermediate Python are excellent. I recommend buying them and reading them.
  3. Jeff Knupp's Writing Idiomatic Python is really good as you gain skills. It's a bit rough around the edges, but it will help you avoid common beginner mistakes and is well worth the read.
  4. Practice a lot. I enjoy the math puzzles on Project Euler. These are not Python specific, but their structure makes them well suited to quick problem solving in any language. Beware -- it's very addictive!

Wednesday, January 8, 2014

Using Bro DNS Logging for Network Management

I was recently asked if someone in our desktop support group could get alerted when certain laptops connected to the corporate network. We have a lot of employees who work at industrial locations and rarely connect their machines to our internal networks, so the support group likes to take those rare opportunities to do management tasks that aren't otherwise automated.

The two mechanisms that came to mind for alerting on these events are DHCP address assignment, and DNS autoregistration. While we do send DHCP logs to a central archive, the process of alerting on a frequently changing list of hostnames would be somewhat cumbersome. I have been looking for ways to use Bro for network management tasks, so this seemed like a natural use case.

We already had Bro instances monitoring DNS traffic for two of our central DNS servers. I don't fully understand how Windows DNS autoregistration works, but from looking at the Bro logs, it appears that the DHCP server sends a DNS SOA query to the central DNS servers containing the hostname of the device to which it assigns a lease.

I wanted to learn how to use the input framework in Bro 2.2, so I wrote the following script and loaded it from local.bro:

This raises a Bro notice whenever one of the hostnames in the hostnames.txt file is seen in a DNS SOA query. I then set up local.bro and broctl to email this notice type to the correct person.

This works for now, but I'd love to hear from any more experienced Bro programmers about better ways to do it.

Thursday, January 2, 2014

New Year Resolution: Code Cleanup

I enjoyed Ethan Banks' post on New Year's Thoughts: Start with Documentation, so I thought I'd write about what I'm doing this week: code cleanup. Over the last couple of years I've written a decent amount of code to automate mundane network management tasks. As quick one-off hacks have turned into things that I actually depend on, I've noticed a lot of ugliness that I want to fix.

Everything here is assuming Python as the language of choice:
  • For scripts that send email, check to make sure the list of mail receivers is up-to-date.
  • Look for those nasty embedded IP addresses and replace them with DNS names.
  • Change from old-style open(FILE)/close(FILE) constructs to with open(FILE) as f constructs.
  • Get rid of "pickles" for persistent structured data storage. Pickles are a form of native Python object serialization that are quick and convenient, but have a lot of potential problems. I've mostly used Python's native SQLite3 library to store data in simple persistent databases, but occasionally I just use plain text files.
  • Look for repetitive code in the main script logic and try to move it into functions or classes where possible. For example, I had several scripts that were building email reports via clunky string concatenation, so I created a Report class that knows how to append to itself and do simple formatting.
  • Remove unused module imports (that were usually there for debugging).
  • Standardize module imports, declaration of constants, etc.
  • Add more comments!
  • Remove old code that was commented out for whatever reason.
  • Look for places where I was creating huge lists in memory and try to figure out a way to reduce memory consumption with generators.
Things I haven't done, but probably should:
Things I haven't done, and probably never will: