Friday, June 20, 2014

Python Sets: Handy for Network Data

My Python-related posts seem to get the most reads, so here's another one!

A problem that comes up fairly often in networking is finding the number of occurrences of unique items in a large collection of data: let's say you want to find all of the unique IP addresses that accessed a website, traversed a firewall, got denied by an ACL, or whatever. Maybe you've extracted the following list from a log file:

1.1.1.1
2.2.2.2
3.3.3.3
1.1.1.1
5.5.5.5
5.5.5.5
1.1.1.1
2.2.2.2
...

and you need to reduce this to:

1.1.1.1
2.2.2.2
3.3.3.3
5.5.5.5

In other words, we're removing the duplicates. In low-level programming languages, removing duplicates is a bit of a pain: generally you need to implement an efficient way to sort an array of items, then traverse the sorted array to check for adjacent duplicates and remove them. In a language that has dictionaries (also known as hash tables or associative arrays), you can do it by adding each item as a key in your dictionary with an empty value, then extract the keys. In Python:

>>> items = ['1.1.1.1','2.2.2.2','3.3.3.3','1.1.1.1','5.5.5.5','5.5.5.5','1.1.1.1','2.2.2.2']
>>> d = {}
>>> for item in items:
...     d[item] = None
...
>>> d
{'5.5.5.5': None, '3.3.3.3': None, '1.1.1.1': None, '2.2.2.2': None}
>>> unique = d.keys()
>>> unique
['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2']


or, more concisely using a dictionary comprehension:

>>> {item:None for item in items}.keys()
['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2']


Python has an even better way, however: the "set" type, which emulates the mathematical idea of a set as a collection of distinct items. If you create an empty set and add items to it, duplicates will automatically be thrown away:

>>> s = set()
>>> s.add('1.1.1.1')
>>> s
set(['1.1.1.1'])
>>> s.add('2.2.2.2')
>>> s.add('1.1.1.1')
>>> s
set(['1.1.1.1', '2.2.2.2'])
>>> for item in items:
...     s.add(item)
...
>>> s
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Predictably, you can use set comprehensions just like list comprehensions to do the same thing as a one liner:

>>> {item for item in items}
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Or, if you have a list built already you can just convert it to a set:

>>> set(items)
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Python also provides methods for the most common types of set operations: union, intersection, difference and symmetric difference. Because these methods accept lists or other iterables, you can quickly find similarities between collections of items:

>>> items
['1.1.1.1', '2.2.2.2', '3.3.3.3', '1.1.1.1', '5.5.5.5', '5.5.5.5', '1.1.1.1', '2.2.2.2']
>>> more_items = ['1.1.1.1','8.8.8.8','1.1.1.1','7.7.7.7','2.2.2.2']
>>> set(items).intersection(more_items)
set(['1.1.1.1', '2.2.2.2'])

>>> set(items).difference(more_items)
set(['5.5.5.5', '3.3.3.3'])


Have fun!

19 comments:

Jeremy Schulman said...

Hi Jay,

Another cool think I tend to use for similar purposes is the "collections.Counter". This will give you both a unique set of keys *and* a count of each item.

For example, I might want to gather all the items from a device inventory that has serial-numbers, and then get a listing and count.

>> from collections import Counter

In the snippet below, the variable "sn" is a Table object (created via the Junos PyEZ library). The "sn" variable is iterable (like a dictionary), so I can use it to build a list via compression and then pass that as the input to the Counter constructor:

>>> catalog = Counter([item.desc for item in sn])
>>>

You can dump the entire collection using the "items()" method:

>>> # pretty-print the catalog items
>>> pprint( catalog.items() )
[('XFP-10G-SR', 2),
('MX FPC Type 2', 1),
('MX SCB', 2),
('8x 1GE(LAN), IQ2', 1),
('RE-S-1800x4', 2),
('Front Panel Display', 1),
('SFP-T', 2),
('DPCE 40x 1GE R', 1),
('DPC PMB', 4),
('DPCE 40x 1GE R EQ', 1),
('SFP-SX', 26),
('PS 1.2-1.7kW; 100-240V AC in', 3),
('DPCE 20x 1GE + 2x 10GE R', 1),
('MX480', 1),
('MX480 Midplane', 1)]

Or access a specific item by name:

>>> catalog['SFP-SX']
26

Hope this helps!

Jay Swan said...

collections.Counter is awesome. You saved me from writing a post about it!

sets, collections and itertools are probably my favorite "secrets" from the standard library.

Kirk Byers said...

Good stuff Jay. Sets are definitely underrated especially when combined with the set operations.

I used a set difference yesterday. Two email lists and I wanted to only send to the individuals that were in list A that were not in list B. Sets made this much easier.

I also liked your dictionary comprehensions and set comprehensions.

Anonymous said...

If you want create unique list try:

In [1]: list(set(['123', '213', '123']))
Out[1]: ['123', '213']

custom essay said...

You post is really helpful in finding an IP address. You are a genius and helping us all to understand the computer errors and problems.

Gokul Ravi said...

very nice interview questions
vlsi interview questions
extjs interview questions
laravel interview questions
sap bi/bw interview questions
pcb interview questions
unix shell scripting interview questions

Gokul Ravi said...

really awesome blog
hr interview questions
hibernate interview questions
selenium interview questions
c interview questions
c++ interview questions
linux interview questions


Gokul Ravi said...

thanks for sharing this blog
spring mvc interview questions
machine learning online training
servlet interview questions mytectra.in
wcf interview questions


Gokul Ravi said...

nice blog
android training in bangalore
ios training in bangalore

viswanath said...

AWS Training in Bangalore - Live Online & Classroom
myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

IOT Training in Bangalore - Live Online & Classroom
IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

gowsalya said...

I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
Digital Marketing Training in Chennai

Digital Marketing Training in Bangalore
Digital Marketing Training in Pune

Saro said...


Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

rpa Training in Chennai

rpa Training in bangalore

rpa Training in pune

blueprism Training in Chennai

blueprism Training in bangalore

blueprism Training in pune

rpa online training

digi mark said...

Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

rpa Training in Chennai

rpa Training in bangalore

rpa Training in pune

blueprism Training in Chennai

blueprism Training in bangalore

blueprism Training in pune

rpa online training

shalinipriya said...

Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
Data Science with Python training in chenni
Data Science training in chennai
Data science training in velachery
Data science training in tambaram
Data Science training in OMR
Data Science training in anna nagar
Data Science training in chennai
Data science training in Bangalore

simbu said...

I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
java training in marathahalli | java training in btm layout

java training in jayanagar | java training in electronic city

java training in chennai | java training in USA

selenium training in chennai

Mouni yoga said...

This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot. 
python training in pune
python online training
python training in OMR

ummayasri said...

I recently came across your blog and have been reading along. I thought I would leave my first comment.

Blue Prism Training Course in Pune

Blue Prism Training Institute in Bangalore

chitra pragya said...

Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

angularjs-Training in sholinganallur

angularjs-Training in velachery

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

cynthia williams said...

Useful content, I have bookmarked this page for my future reference.
RPA Training in Chennai
Robotics Process Automation Training in Chennai
RPA courses in Chennai
RPA Training
RPA course