#!/usr/bin/env python # coding: utf-8 # #Pairinglist # Some tools or processes generate output, usually in some form of CSV output which holds a line entry for a given type of object, and any number of associated properties, all of the same type. This can be viewed as a compressed form of a table describing a many to many relationship, sometimes called a [junction table](https://en.wikipedia.org/wiki/Junction_table) or as I prefer in the context of this example, a pairing table. # ##Example # Consider a list of people and the operating systems they work with. I have included the trailing empty entries that can occur in csv files. # In[1]: example = [ ['Tom', 'Windows', 'Linux', ''], ['Dick', 'Windows', 'OSX', 'AIX'], ['Harry', 'Linux', 'BSD', ''] ] # ##2-dimensional table # Although this might originally be stored somehwere as a pairing table already, in a database somewhere, if we do not control the source, or whatever mechanism generates the output, how do we extract the underlying structure? Let's say we want to end up with a representation like this: # <table> # <tr> # <th></th> # <th>Windows</th> # <th>Linux</th> # <th>OSX</th> # <th>BSD</th> # <th>AIX</th> # </tr> # <tr> # <td>Tom</td> # <td>X</td> # <td>X</td> # <td></td> # <td></td> # <td></td> # </tr> # <tr> # <td>Dick</td> # <td>X</td> # <td></td> # <td>X</td> # <td></td> # <td>X</td> # </tr> # <tr> # <td>Harry</td> # <td></td> # <td>X</td> # <td></td> # <td>X</td> # <td></td> # </tr> # </table> # # Because the columns in the original data are not aligned we need to extract the original pairing table to enable a pivot transformation. # In[2]: def pairinglist(csvlist): # clear out the empty strings from csv conversion nonemptylist = map(lambda line: filter(None, line), csvlist) # generate all combinations as sublists of tuples pairings = map(lambda line: [(line[0], pairing) for pairing in line[1:]], nonemptylist) # flatten the sublists return [pairing for line in pairings for pairing in line] # In[3]: for entry in example: print entry # In[4]: for entry in pairinglist(example): print entry # ##Visualizations # The list of tuples generated by the function can be fed to a csvwriter, or some other tool. I tend to use Excel pivottables for this kind of data, but some python visualization could also be used.