Loopback Mountain: Why NetFlow Isn't A Web Usage Tracker

2013 Update:
As I mentioned in an update to the original post, HTTP tracking is now available via custom IPFIX exports in a variety of products. Cisco's ISR G2 and ASR 1K routers now have this export available as part of their MACE feature set (Data license required). You'll still need a collector capable of dealing with this export record, however. The content below still applies to NetFlow v5 and traditional NetFlow v9.

Here's a question I find myself answering frequently on the Solarwinds NetFlow forum:

How can I use NetFlow to track the websites being accessed from my network?

The short answer that I usually give on the forum is this: you can't, because NetFlow v5 doesn't track HTTP headers. With this blog post, though, I'll go into the answer in more detail so that I can refer people to it in the future.

First, a quick review of what NetFlow is, and how it works:

When NetFlow is enabled on a router interface, the router begins to track information about the traffic that transits the interface. This information is stored in a data structure called the flow cache.
Periodically, the contents of the flow cache can be exported to a "collector", which is a process running on an external system that receives and stores flow data. This process is called "NetFlow Data Export", or NDE. Typically the collector is tied into an "analyzer", which massages the flow data into something useful for human network analysts.

NDE is optional. One can gather useful information from NetFlow solely from the command-line without ever using an external collector.

Data that can be tracked by NetFlow depends on the version. The most commonly deployed version today is NetFlow version 5, which tracks the following key fields:

Source interface
Source and destination IP address
Layer 4 protocol (e.g., ICMP, TCP, UDP, OSPF, ESP, etc.)
Source and destination port number (if the layer 4 protocol is TCP or UDP)
Type of service value

These "key fields" are used to define a "flow"; that is, a unidirectional conversation between a pair of hosts. Because flows are unidirectional, an important feature in NetFlow analysis software is the ability to pair the two sides of a flow to give a complete picture of the conversation.
Other "non-key" fields are also tracked. In NetFlow version 5, the other fields are as follows. Note that not all collector software preserves all the fields.

TCP flags (used by the router to determine the beginning and end of a TCP flow)
Egress interface
Timestamps
Packet and byte count for the flow
BGP origin AS and peer AS
IP next-hop
Source and destination netmask

NetFlow v9, Cisco Flexible NetFlow, and IPFIX (the IETF flow protocol, which is very similar to NetFlow v9) allow user-defined fields that can track any part of the packet headers. IPFIX offers enough flexibility to track information about HTTP sessions, and many vendors are starting to implement this capability.
Many vendors have defined other flow protocols that offer more or fewer capabilities, but virtually all of them duplicate at least the functions of NetFlow v5.

For reference, here's a snapshot from a packet capture of a NetFlow v5 export packet (the destination public IP address has been disguised as a RFC 1918 address):

    pdu 1/30
        SrcAddr: 203.79.123.118 (203.79.123.118)
        DstAddr: 10.118.218.102 (10.118.218.102)
        NextHop: 0.0.0.0 (0.0.0.0)
        InputInt: 1
        OutputInt: 0
        Packets: 3
        Octets: 144
        [Duration: 1.388000000 seconds]
            StartTime: 3422510.740000000 seconds
            EndTime: 3422512.128000000 seconds
        SrcPort: 3546
        DstPort: 445 <-- probably a port scan for open Microsoft services
        padding
        TCP Flags: 0x02
        Protocol: 6 <-- this is the layer 4 protocol; i.e. TCP
        IP ToS: 0x00
        SrcAS: 4768 <-- this particular router is tracking BGP Origin-AS
        DstAS: 0
        SrcMask: 22 (prefix: 203.79.120.0/22)
        DstMask: 30 (prefix: 10.118.218.100/30)
        padding

Returning to our original question:

NetFlow v5 isn't a good web usage tracker because nowhere in the list of fields above do we see "HTTP header". The HTTP header is the part of the application layer payload that actually specifies the website and URL that's being requested. Here's a sample from another packet capture:

GET / HTTP/1.1 

User-Agent: curl/7.21.6 (i686-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3

Host: www.ubuntu.com

Accept: */*

This is the request sent by the HTTP client (in this case the "curl" command-line HTTP utility) when accessing http://www.ubuntu.com. The header "GET / HTTP/1.1" command requests the root ("/") of the website referenced by the "Host:" field; i.e. www.ubuntu.com.

The IP address used in this request was 91.189.89.88. However if we do a reverse lookup on this address, the record returned is different:

$ dig -x 91.189.89.88 +short

privet.canonical.com.

A little search-engine-fu shows that several other websites are hosted at the same IP address:

kubuntu.org
canonical.com

If we do the same trick with other websites (like unroutable.blogspot.com, hosted by Google), we can easily find cases in which there are dozens of websites hosted at the same IP address.

Because NetFlow doesn't extract the HTTP header from TCP flows, we have only the IP address to go on. As we've seen here, many different websites can be hosted at the same IP address; there's no way to tell just from NetFlow whether a user visited www.canonical.com or www.ubuntu.com. Furthermore, with the most popular sites hosted on content distribution caches or cloud service providers, the reverse DNS lookups for high-bandwidth port 80 flows frequently resolve to names in networks like Akamai, Limelight, Google, Amazon Web Services, Rackspace, etc., even if those content distribution networks have nothing to do with the content of the actual website that was visited.

The bottom line is this: if you want to track what websites are visited by users on a network, NetFlow v5 isn't the best tool, or even a good one. A web proxy (e.g., Squid) or a web content filter (e.g., Websense, Cisco WSA, etc.) is a probably the best tool, since they track not only HTTP host headers but also (usually) the Active Directory username associated with the request.

Other tools that could do the job are security related tools like httpry or Bro-IDS, both of which have features for HTTP request tracking. These tools are both available in the excellent Security Onion Linux distribution.

[Edited to add] The anonymous commenter below observes that nProbe exports HTTP header information via IPFIX, and notes that some vendors have firewalls that do so as well. nProbe is an excellent free tool that takes a raw packet stream and converts it to NetFlow or IPFIX export format.

12 comments:

Anonymous said...: I think you need to look at some of the most recent advances for Next Generation firewalls such as SonicWALL and Palo Alto. Both of these devices deliver rich deep packet information via IPFIX. Also nProbe can also listen and export HTTP URL information.
Here is some good info on SonicWALL's implementation.
http://www.sonicwall.com/us/products/Scrutinizer.html
Also, here is information on exporting URLs from nProbe:
http://www.plixer.com/blog/network-traffic-analysis/how-to-configure-nprobe-to-export-urls-and-latency-via-netflow/

SonicWALL will also export the username of the person doing the surfing if you have SSO turned on.
My recent experiences and conversation with hardware vendors all seem to point to more and more DPI information being exported by Netflow/IPFIX.; April 9, 2012 at 7:15 AM
Arif Khichi said...: Thanks to share this content adobe Photoshop cc crack; September 17, 2021 at 8:34 PM
Software said...: Nice explanation and article. Continue to write articles like these, and visit my website at https://usacrack.info/ for more information.

StudioLine Web Designer 4.2.66 Crack; October 13, 2021 at 1:27 AM
Nabiha art said...: This is amzaing blog thanks for that great information. Thanks!
adobe-dreamweaver-crack

combin-crack/

vray-crack; October 31, 2021 at 5:38 AM
Crackglobal said...: Amazing blog! I really like the way you explained such information about this post with us. And blog is really helpful for us this website
studioline-web-designer-crack
sandboxie-pro-crack
netlimiter-pro-crack
photo-mechanic-6-crack
ipvanish-crack
sidify-music-converter-crack
aquasoft-slideshow-ultimate-crack
driverpack-solution-crack; January 8, 2022 at 1:56 AM
Anonymous said...: IDM Crack 2022 Full Version 6.40 Build 2 Patch With Serial Key is a setup software program and be the selection of prominent figures of people all
Nero BackItUp Crack
Iris Pro Crack
Jaf Box Crack
PowerArchiver Crack
IStripper Pro Crack
FixMeStick 2022 Crack
MemTest86 Pro Crack
Leawo Video Converter Ultimate Crack
Wolfram Mathematica Crack; February 22, 2022 at 6:19 AM
LeeCrack.com said...: I like your all post. You have done really good work. Thank you for the information you provide, it helped me a lot. I hope to have many more entries or so from you.

iobit-malware-fighter-pro Crack

studioline-web-designer Crack

avocode Crack; February 26, 2022 at 11:19 AM
Clora said...: good content. great job you are doing keep it up. If you need free software then click the link to get it free. https://xactivator.net/iobit-uninstaller-pro-crack/; March 3, 2022 at 1:36 AM
SK said...: Really Good Work Done By You...However, stopping by with great quality writing, it's hard to see any good blog today.
PRCrack
PaperScan Pro CRACK
Crack Softwares Free Download; March 25, 2022 at 10:46 AM
Johny Sins said...: I guess I am the only one who came here to share my very own experience. Guess what!? I am using my laptop for almost the past 2 years, but I had no idea of solving some basic issues. I do not know how to Easy to Direct Download But thankfully, I recently visited a website named softcrack
Wondershare Video Converter Ultimate Crack; March 29, 2022 at 9:01 AM
Rubab678 said...: This was very useful and informative.
Mahacrack; April 8, 2022 at 3:57 AM
Hoorain Rehman said...: Windows 10 Full
Grand Theft Auto V
Chimera Tool Premium Crack
Waves Bundle Crack
SketchUp Pro Crack
Adobe Illustrator CC Latest; September 4, 2023 at 11:05 AM

Thursday, April 5, 2012

Why NetFlow Isn't A Web Usage Tracker

12 comments: