Friday, June 20, 2014

Python Sets: Handy for Network Data

My Python-related posts seem to get the most reads, so here's another one!

A problem that comes up fairly often in networking is finding the number of occurrences of unique items in a large collection of data: let's say you want to find all of the unique IP addresses that accessed a website, traversed a firewall, got denied by an ACL, or whatever. Maybe you've extracted the following list from a log file:

1.1.1.1
2.2.2.2
3.3.3.3
1.1.1.1
5.5.5.5
5.5.5.5
1.1.1.1
2.2.2.2
...

and you need to reduce this to:

1.1.1.1
2.2.2.2
3.3.3.3
5.5.5.5

In other words, we're removing the duplicates. In low-level programming languages, removing duplicates is a bit of a pain: generally you need to implement an efficient way to sort an array of items, then traverse the sorted array to check for adjacent duplicates and remove them. In a language that has dictionaries (also known as hash tables or associative arrays), you can do it by adding each item as a key in your dictionary with an empty value, then extract the keys. In Python:

>>> items = ['1.1.1.1','2.2.2.2','3.3.3.3','1.1.1.1','5.5.5.5','5.5.5.5','1.1.1.1','2.2.2.2']
>>> d = {}
>>> for item in items:
...     d[item] = None
...
>>> d
{'5.5.5.5': None, '3.3.3.3': None, '1.1.1.1': None, '2.2.2.2': None}
>>> unique = d.keys()
>>> unique
['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2']


or, more concisely using a dictionary comprehension:

>>> {item:None for item in items}.keys()
['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2']


Python has an even better way, however: the "set" type, which emulates the mathematical idea of a set as a collection of distinct items. If you create an empty set and add items to it, duplicates will automatically be thrown away:

>>> s = set()
>>> s.add('1.1.1.1')
>>> s
set(['1.1.1.1'])
>>> s.add('2.2.2.2')
>>> s.add('1.1.1.1')
>>> s
set(['1.1.1.1', '2.2.2.2'])
>>> for item in items:
...     s.add(item)
...
>>> s
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Predictably, you can use set comprehensions just like list comprehensions to do the same thing as a one liner:

>>> {item for item in items}
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Or, if you have a list built already you can just convert it to a set:

>>> set(items)
set(['5.5.5.5', '3.3.3.3', '1.1.1.1', '2.2.2.2'])


Python also provides methods for the most common types of set operations: union, intersection, difference and symmetric difference. Because these methods accept lists or other iterables, you can quickly find similarities between collections of items:

>>> items
['1.1.1.1', '2.2.2.2', '3.3.3.3', '1.1.1.1', '5.5.5.5', '5.5.5.5', '1.1.1.1', '2.2.2.2']
>>> more_items = ['1.1.1.1','8.8.8.8','1.1.1.1','7.7.7.7','2.2.2.2']
>>> set(items).intersection(more_items)
set(['1.1.1.1', '2.2.2.2'])

>>> set(items).difference(more_items)
set(['5.5.5.5', '3.3.3.3'])


Have fun!

55 comments:

  1. Hi Jay,

    Another cool think I tend to use for similar purposes is the "collections.Counter". This will give you both a unique set of keys *and* a count of each item.

    For example, I might want to gather all the items from a device inventory that has serial-numbers, and then get a listing and count.

    >> from collections import Counter

    In the snippet below, the variable "sn" is a Table object (created via the Junos PyEZ library). The "sn" variable is iterable (like a dictionary), so I can use it to build a list via compression and then pass that as the input to the Counter constructor:

    >>> catalog = Counter([item.desc for item in sn])
    >>>

    You can dump the entire collection using the "items()" method:

    >>> # pretty-print the catalog items
    >>> pprint( catalog.items() )
    [('XFP-10G-SR', 2),
    ('MX FPC Type 2', 1),
    ('MX SCB', 2),
    ('8x 1GE(LAN), IQ2', 1),
    ('RE-S-1800x4', 2),
    ('Front Panel Display', 1),
    ('SFP-T', 2),
    ('DPCE 40x 1GE R', 1),
    ('DPC PMB', 4),
    ('DPCE 40x 1GE R EQ', 1),
    ('SFP-SX', 26),
    ('PS 1.2-1.7kW; 100-240V AC in', 3),
    ('DPCE 20x 1GE + 2x 10GE R', 1),
    ('MX480', 1),
    ('MX480 Midplane', 1)]

    Or access a specific item by name:

    >>> catalog['SFP-SX']
    26

    Hope this helps!

    ReplyDelete
  2. collections.Counter is awesome. You saved me from writing a post about it!

    sets, collections and itertools are probably my favorite "secrets" from the standard library.

    ReplyDelete
  3. Good stuff Jay. Sets are definitely underrated especially when combined with the set operations.

    I used a set difference yesterday. Two email lists and I wanted to only send to the individuals that were in list A that were not in list B. Sets made this much easier.

    I also liked your dictionary comprehensions and set comprehensions.

    ReplyDelete
  4. If you want create unique list try:

    In [1]: list(set(['123', '213', '123']))
    Out[1]: ['123', '213']

    ReplyDelete
  5. You post is really helpful in finding an IP address. You are a genius and helping us all to understand the computer errors and problems.

    ReplyDelete
  6. I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
    java training in marathahalli | java training in btm layout

    java training in jayanagar | java training in electronic city

    java training in chennai | java training in USA

    selenium training in chennai

    ReplyDelete
  7. This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot. 
    python training in pune
    python online training
    python training in OMR

    ReplyDelete
  8. I recently came across your blog and have been reading along. I thought I would leave my first comment.

    Blue Prism Training Course in Pune

    Blue Prism Training Institute in Bangalore

    ReplyDelete
  9. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in btm

    ReplyDelete
  10. Brilliant ideas that you have share with us.It is really help me lot and i hope it will help others also.update more different ideas with us.
    Java Training in Kelambakkam
    Java Training in Ashok Nagar
    Java Training in Nolambur
    Java Training center in Bangalore

    ReplyDelete
  11. For Devops Training in Bangalore Visit:
    Devops Training in Bangalore

    ReplyDelete
  12. I learned World's Trending Technology from certified experts for free of cost. I Got a job in decent Top MNC Company with handsome 14 LPA salary, I have learned the World's Trending Technology from Python training in pune experts who know advanced concepts which can help to solve any type of Real-time issues in the field of Python. Really worth trying instant approval blog commenting sites

    ReplyDelete
  13. Hey Nice Blog!! Thanks For Sharing!!! Wonderful blog & good post. It is really very helpful to me, waiting for a more new post. Keep Blogging ! Here is the best angular training with free Bundle videos .

    contact No :- 9885022027.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Thanks for sharing.I appreciate your efforts for producing such high quality content.Visit big data certification bangalore if you are looking for any big data related certification courses

    ReplyDelete
  17. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..thanks
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete
  18. This comment has been removed by the author.

    ReplyDelete
  19. Get amazing digital marketing services from the Best Digital Marketing Company in Zirakpur. Boost your sales, leads, and traffic at a very affordable price.


    Our Services:

    Website development Company in Zirakpur

    Application development Company in Zirakpur

    Graphic designing Company in Zirakpur

    Social Media Company in Zirakpur



    Contact Information:


    Name: GS Web Technologies: Website, Mobile App Development, Graphic Designing and Digital Marketing Company in Zirakpur, India.


    Address: 3rd Floor, Paras Down Town Square Mall, Zirakpur, Punjab – 140603, India.


    Phone Number: +91- 9501406707

    ReplyDelete

  20. Nice Post... waiting for your next post. I have learned some new information. thanks for sharing.

    performance marketing

    ReplyDelete