Friday, June 20, 2014

Python Sets: Handy for Network Data

My Python-related posts seem to get the most reads, so here's another one!

A problem that comes up fairly often in networking is finding the number of occurrences of unique items in a large collection of data: let's say you want to find all of the unique IP addresses that accessed a website, traversed a firewall, got denied by an ACL, or whatever. Maybe you've extracted the following list from a log file:

and you need to reduce this to:

In other words, we're removing the duplicates. In low-level programming languages, removing duplicates is a bit of a pain: generally you need to implement an efficient way to sort an array of items, then traverse the sorted array to check for adjacent duplicates and remove them. In a language that has dictionaries (also known as hash tables or associative arrays), you can do it by adding each item as a key in your dictionary with an empty value, then extract the keys. In Python:

>>> items = ['','','','','','','','']
>>> d = {}
>>> for item in items:
...     d[item] = None
>>> d
{'': None, '': None, '': None, '': None}
>>> unique = d.keys()
>>> unique
['', '', '', '']

or, more concisely using a dictionary comprehension:

>>> {item:None for item in items}.keys()
['', '', '', '']

Python has an even better way, however: the "set" type, which emulates the mathematical idea of a set as a collection of distinct items. If you create an empty set and add items to it, duplicates will automatically be thrown away:

>>> s = set()
>>> s.add('')
>>> s
>>> s.add('')
>>> s.add('')
>>> s
set(['', ''])
>>> for item in items:
...     s.add(item)
>>> s
set(['', '', '', ''])

Predictably, you can use set comprehensions just like list comprehensions to do the same thing as a one liner:

>>> {item for item in items}
set(['', '', '', ''])

Or, if you have a list built already you can just convert it to a set:

>>> set(items)
set(['', '', '', ''])

Python also provides methods for the most common types of set operations: union, intersection, difference and symmetric difference. Because these methods accept lists or other iterables, you can quickly find similarities between collections of items:

>>> items
['', '', '', '', '', '', '', '']
>>> more_items = ['','','','','']
>>> set(items).intersection(more_items)
set(['', ''])

>>> set(items).difference(more_items)
set(['', ''])

Have fun!


Jeremy Schulman said...

Hi Jay,

Another cool think I tend to use for similar purposes is the "collections.Counter". This will give you both a unique set of keys *and* a count of each item.

For example, I might want to gather all the items from a device inventory that has serial-numbers, and then get a listing and count.

>> from collections import Counter

In the snippet below, the variable "sn" is a Table object (created via the Junos PyEZ library). The "sn" variable is iterable (like a dictionary), so I can use it to build a list via compression and then pass that as the input to the Counter constructor:

>>> catalog = Counter([item.desc for item in sn])

You can dump the entire collection using the "items()" method:

>>> # pretty-print the catalog items
>>> pprint( catalog.items() )
[('XFP-10G-SR', 2),
('MX FPC Type 2', 1),
('MX SCB', 2),
('8x 1GE(LAN), IQ2', 1),
('RE-S-1800x4', 2),
('Front Panel Display', 1),
('SFP-T', 2),
('DPCE 40x 1GE R', 1),
('DPC PMB', 4),
('DPCE 40x 1GE R EQ', 1),
('SFP-SX', 26),
('PS 1.2-1.7kW; 100-240V AC in', 3),
('DPCE 20x 1GE + 2x 10GE R', 1),
('MX480', 1),
('MX480 Midplane', 1)]

Or access a specific item by name:

>>> catalog['SFP-SX']

Hope this helps!

Jay Swan said...

collections.Counter is awesome. You saved me from writing a post about it!

sets, collections and itertools are probably my favorite "secrets" from the standard library.

Kirk Byers said...

Good stuff Jay. Sets are definitely underrated especially when combined with the set operations.

I used a set difference yesterday. Two email lists and I wanted to only send to the individuals that were in list A that were not in list B. Sets made this much easier.

I also liked your dictionary comprehensions and set comprehensions.

Anonymous said...

If you want create unique list try:

In [1]: list(set(['123', '213', '123']))
Out[1]: ['123', '213']

custom essay said...

You post is really helpful in finding an IP address. You are a genius and helping us all to understand the computer errors and problems.

Gokul Ravi said...

very nice interview questions
vlsi interview questions
extjs interview questions
laravel interview questions
sap bi/bw interview questions
pcb interview questions
unix shell scripting interview questions

Gokul Ravi said...

really awesome blog
hr interview questions
hibernate interview questions
selenium interview questions
c interview questions
c++ interview questions
linux interview questions

Gokul Ravi said...

thanks for sharing this blog
spring mvc interview questions
machine learning online training
servlet interview questions
wcf interview questions

Gokul Ravi said...

nice blog
android training in bangalore
ios training in bangalore

viswanath said...

AWS Training in Bangalore - Live Online & Classroom
myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

IOT Training in Bangalore - Live Online & Classroom
IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

gowsalya said...

I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
Digital Marketing Training in Chennai

Digital Marketing Training in Bangalore
Digital Marketing Training in Pune

Saro said...

Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

rpa Training in Chennai

rpa Training in bangalore

rpa Training in pune

blueprism Training in Chennai

blueprism Training in bangalore

blueprism Training in pune

rpa online training

digi mark said...

Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

rpa Training in Chennai

rpa Training in bangalore

rpa Training in pune

blueprism Training in Chennai

blueprism Training in bangalore

blueprism Training in pune

rpa online training

shalinipriya said...

Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
Data Science with Python training in chenni
Data Science training in chennai
Data science training in velachery
Data science training in tambaram
Data Science training in OMR
Data Science training in anna nagar
Data Science training in chennai
Data science training in Bangalore

simbu said...

I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
java training in marathahalli | java training in btm layout

java training in jayanagar | java training in electronic city

java training in chennai | java training in USA

selenium training in chennai

Mouni yoga said...

This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot. 
python training in pune
python online training
python training in OMR

ummayasri said...

I recently came across your blog and have been reading along. I thought I would leave my first comment.

Blue Prism Training Course in Pune

Blue Prism Training Institute in Bangalore

chitra pragya said...

Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

angularjs-Training in sholinganallur

angularjs-Training in velachery

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

cynthia williams said...

Useful content, I have bookmarked this page for my future reference.
RPA Training in Chennai
Robotics Process Automation Training in Chennai
RPA courses in Chennai
RPA Training
RPA course

LindaJasmine said...

Interesting Post. I liked your style of writing. It is very unique. Thanks for Posting.

Node JS Training in Chennai
Node JS Course in Chennai
Node JS Advanced Training
Node JS Training Institute in chennai
Node JS Training Institutes in chennai
Node JS Course

Shiva Shakthi said...

The blog which you are shared is very much helpful for us to knew about the web designing. thanks for your information.
Web Designing Institute
Best Web Design Courses
Web Design Training Courses
Learn Website Design
Best Way to Learn Web Design

Sumaya Manzoor said...

All your points are excellent, keep doing great work.
Selenium Training in Chennai
Best selenium training in chennai
iOS Training in Chennai
Digital Marketing Training in Chennai
.Net coaching centre in chennai
Android Training in Velachery
Android Course in Adyar
Android Training Tambaram

yuva prithika said...

This is really too useful and have more ideas and keep sharing many techniques. Eagerly waiting for your new blog keep doing more.
Aws Training in Bangalore
Aws Course in Bangalore
Best Aws Training in Bangalore
hadoop classes in bangalore
Java Training in Bangalore
Best Java Training Institutes in Bangalore

Vicky Ram said...

Thank you for sharing this post.


Anbarasan14 said...

Thanks for sharing this useful information. Keep doing regularly.

English Speaking Course in JP Nagar Bangalore
Best Spoken English Coaching Center in JP Nagar
Spoken English Classes in Bangalore JP Nagar
French Training Institutes in JP Nagar
French Coaching Classes in JP Nagar
French Courses in JP Nagar
Best French Classes near me

ram ramky said...

I think this was one of the most interesting content I have read today. Please keep posting.
selenium Training in Chennai
Selenium Training Chennai
ios training institute in chennai
Digital Marketing Course in Chennai
.Net coaching centre in chennai
Best DOT NET Training in Chennai 
.net training
mvc training in chennai

sharmi chithra said...

Nice post. I learned some new information. Thanks for sharing.

Xamarin Training in Chennai
Xamarin Course in Chennai
Xamarin Training
Xamarin Course
Xamarin Training Course
Xamarin Classes
Best Xamarin Course

LindaJasmine said...

Thanks for sharing such an amazing post. Your style of writing is very unique. It made me mesmerized in your words. Keep on writing.

Informatica Training in Chennai
Informatica Training Center Chennai
Best Informatica Training in Chennai
Informatica course in Chennai
Informatica Training center in Chennai
Informatica Training
Learn Informatica
Informatica course

mercyroy said...

Brilliant ideas that you have share with us.It is really help me lot and i hope it will help others also.update more different ideas with us.
Java Training in Kelambakkam
Java Training in Ashok Nagar
Java Training in Nolambur
Java Training center in Bangalore