Friday, April 27, 2012

CUCM CDR Cause Codes

Lately I've been doing a fair amount of work with CUCM call detail records. The cause codes are always a pain, since they're listed as numeric codes that correspond to meanings listed only in otherwise obscure documents.

To help with this, I created a big list of all the different cause codes used in Cisco Unified Communication Manager call detail records, put them into Python dictionary format, and posted them on Github:

https://github.com/jayswan/cdr/blob/master/causecodes.py

They should be human readable as-is, and easy enough to convert into hash tables for use in other scripting languages. Hopefully this will save someone from having to pore through the Cisco PDFs that otherwise are the only source for this information. I used the CUCM 6.x documentation when creating it, but it should be quite similar for other versions.

One of these days I'll clean up my CDR analysis scripts enough for public consumption.

Wednesday, April 18, 2012

Cisco IOS 15.2M and 15.2T Command Reference & Configuration Guides

For my own reference; as of today Cisco's search tool doesn't show these on the first page of results even if you search for the explicit document names. Google and DuckDuckGo do better, but don't find the root pages.

Command References
Configuration Guides

Monday, April 16, 2012

CCIE: Five Year Reflections

I passed the CCIE Routing & Switching lab five years ago today. Back then my number seemed enormous, but now five years later I'm already below the halfway point (as of this writing I believe the numbers are in the mid-to-high thirty-thousands). A lot has changed since then: it seems like "data center" and "cloud" have taken over almost completely as the hot topics in network engineering (with "software defined networking" hot on their heels), and it seems like Cisco has lost some of its shine due to its rapid diversification into markets outside pure networking and the rise of tough competitors in networking niche markets. We've gone through a huge economic contraction that we may or may not be exiting. In the certification world, Cisco has added several new tracks, and other companies have added their own coveted expert-level certifications.

I want to write about a few trends in certification and professional development that I've either observed personally, or that seem to be the subject of frequent discussion on the Internets.

Consolidation
One of the most interesting things I've noticed as a regular attendee at Cisco Live is that almost all CCIEs are in one of three categories:
  1. Consultants working for Cisco resellers.
  2. Employees of Cisco or one of its competitors.
  3. Instructors working in the training and certification business.
Maybe this is sampling bias: perhaps it's just that the majority of CCIEs who attend Cisco Live also fall into one of those categories, and the ones who don't aren't attending in droves. Still, it seems comparatively rare to find CCIEs who are actually employed full time in network design or operations for a single company. I think one reason for this is that as IT employees in operational positions gain experience and seniority, their training and professional development opportunities decrease, possibly due to increasing costs, lack of availability of advanced training, and reluctance of employers to have their A-Teamers away from the office.

I think this is unfortunate, and it may be one cause behind the churn that companies tend to experience among high-level technical employees. The expense of maintaining training and professional development programs for these employees may also be a factor in the amount of outsourcing that we see in the network engineering field.

Track Proliferation and Specialization
I feel like the CCIE Routing & Switching track is kind of like a black belt in a legitimate martial art: it represents a thorough mastery of the basics, impresses novices who don't know any better, and hopefully impresses upon its recipients that they are really just at the beginning of the path. It still seems to me like it would be hard to pass the CCIE lab without understanding fundamental networking really well, but apparently it is possible; it's not uncommon to read about "paper CCIEs", and I've met at least one myself.

For me, the whole motivation behind studying for the lab was to confirm and exercise my understanding of the basics. I'm not a consultant or reseller, and I'm no longer a trainer; I actually work on the same network every day, and although my employer was very supportive of my studies, they certainly didn't require it. This motivation is one of the reasons that I haven't gone on to another track: they're too product specific for what I do. I work daily with Cisco security, voice, and wireless products, but I'm not intellectually driven by that kind of product specificity in anything resembling the way that I'm driven by the underlying theory and practice of general networking. The logical next step for me would probably be the CCDE, and indeed I was lucky enough to be invited as a beta participant in that program. I got spanked badly on the practical and haven't gone back, at least partly because the exam made it clear to me that even if I passed, I wouldn't have the real-world experience of working on multi-thousand router networks to go along with it.

Defining the Super-Generalist
None of this is meant to diminish the accomplishment inherent in the other CCIE tracks in any way: I remain extremely impressed by my friends who have passed the other tracks. However, for people working in mainstream IT networking my observation has been that the world could use more super-generalists. What skills should the super-generalist have? Here's my take on it:

[Edited to add: I'm not saying that this is a high-level skill set that substitutes for a CCIE. I'm saying this is a good base for working towards CCIE, and that if you find yourself missing big chunks of this while working on your second CCIE, you might consider re-prioritizing your learning.]
  • Extremely solid IPv4 networking fundamentals. Certification programs are supposed to emphasize the basics, but I see CCNP-level people who haven't yet fully grokked ARP, STP, connection-oriented vs connectionless concepts, or why routing protocols work the way they do, even if they can explain how they work
  • A growing familiarity with IPv6, and an appreciation of how protocols other than IPv4 have attempted to solve common problems.
  • The ability to use Wireshark and tcpdump and interpret the resulting data.
  • An understanding of the inner workings of common application-layer protocols, especially HTTP, DNS, and SMTP (yeah yeah, you can say email is dead but people still scream when it breaks). People can and do make entire careers out of each one, but understanding the basics is imperative. I am always amazed at how common it is to see server admins who don't understand HTTP response codes or how a recursive DNS query works.
  • A familiarity with the internals of both Windows and Linux.
  • Familiarity with common virtual machine platforms and how they affect networks.
  • The basics of a scripting language and the common automation tools in the platforms with which you work most frequently.
  • Fundamentals of network monitoring: SNMP, NetFlow, syslog, WMI, taps and mirror ports, considerations for asymmetric flows, etc.
  • The basics of databases. This has long been one of my weakest areas, and something I've been working on fixing.
  • The security considerations surrounding all of the above--and not just from a control standpoint. It's not enough to just know packet filtering and encryption; you also need to understand more than a little about the psychological aspects of security and privacy, and you should understand how your monitoring and diagnostic tools can be used both for good and ill.
  • The big-picture of how the Internet works: what BGP is and the common ways that ISPs connect to customers and to each other, what CDNs are, the role of IANA and the RIRs, what the IETF and RFCs are, etc.
  • A little respect for the ones who have gone before us, and some knowledge of Internet folklore. You damn well ought to know a little about the likes of Paul Baran, John Postel, Vint Cerf, Radia Perlman, and many others.
  • The ability to write and speak coherently!
I'm sure I've left a few things out (add them in the comments), but even with just these you can iterate through them for years on end.

Thursday, April 12, 2012

Thoughts on Udacity CS101

Over the last 7 weeks I took the first-ever offering of Udacity's CS101 class. This was billed as a free basic computer science class for raw beginners, using Python as the language of choice. I'm neither a beginner nor a skilled programmer: I started programming more years ago than I care to admit, but I've never done it as a core part of my job, and I'm entirely self-taught. Over the years I've used only a small number of languages:
  • BASIC, back in the Apple ][ days
  • Perl (off and on--mostly off), since the Perl 4.x days
  • Objective-C (back on the OpenStep platform, before MacOS X revived it)
  • I've also played around with several other languages, including C, Pascal, Tcl, JavaScript, and others, but haven't done anything with them beyond the play stage.
  • Python
I started my exposure to Python maybe 18 months ago when I had a few small work-related scripting projects, and I wanted to learn something new. I've historically used Perl for networking-related scripts, but although I'd probably used it more than any other language, it's never really clicked for me in an intuitive way.

I started by working through the introductory Python course on Google Code, and it immediately felt right; I was able to quickly start writing small useful scripts without a lot of trouble. After that, I spent some time working through some other tutorials, materials from Pycon, and puzzles at Project Euler. I also wrote quite a few small projects at work, some of which I've blogged about here. Naturally, I was excited when I heard about Udacity's new curriculum; I've always wanted to take some CS classes to fill in holes in my base knowledge, but I've never had time. So, how did it go?

The good:
  • The user interface and website functionality was GREAT. I loved the format and delivery style of the videos. The tablet interface that the instructors use to deliver the course material is outstanding.
  • The embedded Python interpreter works really well. Only a few times did I feel it was necessary to code outside the embedded tools.
  • The homework assignments were well crafted and fairly graded.
  • The instructors were excellent. I really liked the "field trips" into real-world environments and the trips into computing history.
  • The short class cycles are really nice, and make it easier to keep up with the course.
Caveats (I'm not saying this is bad... just stuff to be aware of)
  • This doesn't seem to be a course for raw beginners, unless you have a ton of time to put into it. It started off really slow, but quickly ramped up the pace. If I had had no programming background I wouldn't have been able to keep up after the first few weeks.
  • It doesn't seem to be directed at intermediate programmers either. With the exception of the final sections on recursion, I didn't learn anything completely new, but the experience of doing the homework was still excellent for practicing and clarifying the basics. That said, I don't think they could have done it much better... when you are offering a class to thousands of participants, you need to make some tough choices. Frame your expectations accordingly.
  •  I wish that they had introduced some core Python concepts earlier and worked backwards into implementation details, rather than working forwards from smaller pieces. For example: before learning about Python dictionaries (i.e. hash tables), we had to implement a simple hash table using lists. This was interesting, but tedious--and I had the advantage of being able to see where they were going with it. I felt like it would have been better to introduce the dictionary first, then explain how you would implement it using simpler components. They also skipped some of the constructs that make Python seem so much more powerful and intuitive than some other languages, such as list comprehensions and generator expressions. The instructors have much more experience with teaching the material than I ever will, however--so maybe their way works better for the majority of students.
  • I never got into the discussion forums--they seemed rather chaotic, and every time I checked them I had to wade through a lot of posts complaining about issues with the grading. At the same time, the forums are really the only place you can get any personalized assistance, so I guess I'll need to figure them out in the future.
Overall, I thought it was a great experience and I'll definitely be taking another course from Udacity in the future.

Thursday, April 5, 2012

Why NetFlow Isn't A Web Usage Tracker

2013 Update:
As I mentioned in an update to the original post, HTTP tracking is now available via custom IPFIX exports in a variety of products. Cisco's ISR G2 and ASR 1K routers now have this export available as part of their MACE feature set (Data license required). You'll still need a collector capable of dealing with this export record, however. The content below still applies to NetFlow v5 and traditional NetFlow v9.

Here's a question I find myself answering frequently on the Solarwinds NetFlow forum:

How can I use NetFlow to track the websites being accessed from my network?

The short answer that I usually give on the forum is this: you can't, because NetFlow v5 doesn't track HTTP headers. With this blog post, though, I'll go into the answer in more detail so that I can refer people to it in the future.

First, a quick review of what NetFlow is, and how it works:
  • When NetFlow is enabled on a router interface, the router begins to track information about the traffic that transits the interface. This information is stored in a data structure called the flow cache.
  • Periodically, the contents of the flow cache can be exported to a "collector", which is a process running on an external system that receives and stores flow data. This process is called "NetFlow Data Export", or NDE. Typically the collector is tied into an "analyzer", which massages the flow data into something useful for human network analysts.
    •  NDE is optional. One can gather useful information from NetFlow solely from the command-line without ever using an external collector.
  • Data that can be tracked by NetFlow depends on the version. The most commonly deployed version today is NetFlow version 5, which tracks the following key fields:
    • Source interface
    • Source and destination IP address
    • Layer 4 protocol (e.g., ICMP, TCP, UDP, OSPF, ESP, etc.)
    • Source and destination port number (if the layer 4 protocol is TCP or UDP)
    • Type of service value
  • These "key fields" are used to define a "flow"; that is, a unidirectional conversation between a pair of hosts. Because flows are unidirectional, an important feature in NetFlow analysis software is the ability to pair the two sides of a flow to give a complete picture of the conversation.
  • Other "non-key" fields are also tracked. In NetFlow version 5, the other fields are as follows. Note that not all collector software preserves all the fields.
    • TCP flags (used by the router to determine the beginning and end of a TCP flow)
    • Egress interface
    • Timestamps
    • Packet and byte count for the flow
    • BGP origin AS and peer AS
    • IP next-hop
    • Source and destination netmask
  • NetFlow v9, Cisco Flexible NetFlow, and IPFIX (the IETF flow protocol, which is very similar to NetFlow v9) allow user-defined fields that can track any part of the packet headers. IPFIX offers enough flexibility to track information about HTTP sessions, and many vendors are starting to implement this capability.
  • Many vendors have defined other flow protocols that offer more or fewer capabilities, but virtually all of them duplicate at least the functions of NetFlow v5.
For reference, here's a snapshot from a packet capture of a NetFlow v5 export packet (the destination public IP address has been disguised as a RFC 1918 address):

    pdu 1/30
        SrcAddr: 203.79.123.118 (203.79.123.118)
        DstAddr: 10.118.218.102 (10.118.218.102)
        NextHop: 0.0.0.0 (0.0.0.0)
        InputInt: 1
        OutputInt: 0
        Packets: 3
        Octets: 144
        [Duration: 1.388000000 seconds]
            StartTime: 3422510.740000000 seconds
            EndTime: 3422512.128000000 seconds
        SrcPort: 3546
        DstPort: 445 <-- probably a port scan for open Microsoft services
        padding
        TCP Flags: 0x02
        Protocol: 6  <-- this is the layer 4 protocol; i.e. TCP
        IP ToS: 0x00
        SrcAS: 4768 <-- this particular router is tracking BGP Origin-AS
        DstAS: 0
        SrcMask: 22 (prefix: 203.79.120.0/22)
        DstMask: 30 (prefix: 10.118.218.100/30)
        padding
    Returning to our original question:

    NetFlow v5 isn't a good web usage tracker because nowhere in the list of fields above do we see "HTTP header".  The HTTP header is the part of the application layer payload that actually specifies the website and URL that's being requested. Here's a sample from another packet capture:

    GET / HTTP/1.1
    User-Agent: curl/7.21.6 (i686-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3
    Host: www.ubuntu.com
    Accept: */*

     This is the request sent by the HTTP client (in this case the "curl" command-line HTTP utility) when accessing http://www.ubuntu.com. The header "GET / HTTP/1.1" command requests the root ("/") of the website referenced by the "Host:" field; i.e. www.ubuntu.com.

    The IP address used in this request was 91.189.89.88. However if we do a reverse lookup on this address, the record returned is different:

    $ dig -x 91.189.89.88 +short
    privet.canonical.com.

    A little search-engine-fu shows that several other websites are hosted at the same IP address:

    kubuntu.org
    canonical.com

    If we do the same trick with other websites (like unroutable.blogspot.com, hosted by Google), we can easily find cases in which there are dozens of websites hosted at the same IP address.

    Because NetFlow doesn't extract the HTTP header from TCP flows, we have only the IP address to go on. As we've seen here, many different websites can be hosted at the same IP address; there's no way to tell just from NetFlow whether a user visited www.canonical.com or www.ubuntu.com. Furthermore, with the most popular sites hosted on content distribution caches or cloud service providers, the reverse DNS lookups for high-bandwidth port 80 flows frequently resolve to names in networks like Akamai, Limelight, Google, Amazon Web Services, Rackspace, etc., even if those content distribution networks have nothing to do with the content of the actual website that was visited.

    The bottom line is this: if you want to track what websites are visited by users on a network, NetFlow v5 isn't the best tool, or even a good one. A web proxy (e.g., Squid) or a web content filter (e.g., Websense, Cisco WSA, etc.) is a probably the best tool, since they track not only HTTP host headers but also (usually) the Active Directory username associated with the request.

    Other tools that could do the job are security related tools like httpry or Bro-IDS, both of which have features for HTTP request tracking. These tools are both available in the excellent Security Onion Linux distribution.

    [Edited to add] The anonymous commenter below observes that nProbe exports HTTP header information via IPFIX, and notes that some vendors have firewalls that do so as well. nProbe is an excellent free tool that takes a raw packet stream and converts it to NetFlow or IPFIX export format.