By @carnby.
This notebook showcases the basic matta visualizations, as well as their usage.
Note that the init_javascript
call is not needed when running on local server having added the javascript code to your IPython profile.
import pandas as pd
import networkx as nx
import matta
import json
import requests
from networkx.readwrite import json_graph
# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css /home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css
Wordclouds are implemented using the d3.layout.cloud layout by Jason Davies. They work with bags of words. The python Counter
class is perfect for this purposes.
hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text
hamlet[0:100]
u"\ufeff***The Project Gutenberg's Etext of Shakespeare's First Folio***\r\n*********************The Tragedie"
import re
from collections import Counter
words = re.split(r'[\W]+', hamlet.lower())
counts = Counter(words)
df = pd.DataFrame.from_records(counts.iteritems(), columns=['word', 'frequency'])
df.sort_values(['frequency'], ascending=False, inplace=True)
df.head()
word | frequency | |
---|---|---|
995 | the | 1108 |
1877 | and | 920 |
2656 | to | 762 |
2437 | of | 698 |
4951 | you | 593 |
matta.wordcloud(dataframe=df.head(500), text='word', font_size='frequency',
typeface='Helvetica', font_weight='bold',
font_color={'value': 'frequency', 'palette': 'cubehelix', 'scale': 'threshold'})
Treemaps use the Treemap Layout from d3.js. They work with trees, which we construct through networkx.DiGraph
.
flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()
/home/egraells/.virtualenvs/ipython/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning
flare_data['name']
u'flare'
tree = nx.DiGraph()
def add_node(node):
node_id = tree.number_of_nodes() + 1
n = tree.add_node(node_id, name=node['name'])
if 'size' in node:
tree.node[node_id]['size'] = node['size']
if 'children' in node:
for child in node['children']:
child_id = add_node(child)
tree.add_edge(node_id, child_id)
return node_id
root = add_node(flare_data)
# treemap requires this attribute
tree.graph['root'] = root
nx.is_arborescence(tree)
True
import seaborn as sns
matta.treemap(tree=tree, node_value='size', node_label='name',
node_color={'value': 'parent.name', 'scale': 'ordinal', 'palette': sns.husl_palette(15, l=.4, s=.9)})
Sankey or flow diagrams use the Sankey plugin by Mike Bostock. They work with digraphs, just like treemaps. Note that graphs with loops are not supported.
sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')
sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text), directed=True)
sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()
((0, {u'name': u"Agricultural 'waste'"}), (0, 1, {u'value': 124.729}))
matta.flow(graph=sankey_graph, node_label='name', link_weight='value', node_color='indigo',
node_width=12, node_padding=13,
link_color={'value': 'value', 'palette': 'Greys', 'scale': 'threshold'}, link_opacity=0.8)
Parallel Coordinates are based on the code by Jason Davies. They work with pandas.DataFrame
.
df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')
df.head()
economy (mpg) | cylinders | displacement (cc) | power (hp) | weight (lb) | 0-60 mph (s) | year | |
---|---|---|---|---|---|---|---|
name | |||||||
AMC Ambassador Brougham | 13.0 | 8 | 360 | 175 | 3821 | 11.0 | 73 |
AMC Ambassador DPL | 15.0 | 8 | 390 | 190 | 3850 | 8.5 | 70 |
AMC Ambassador SST | 17.0 | 8 | 304 | 150 | 3672 | 11.5 | 72 |
AMC Concord DL 6 | 20.2 | 6 | 232 | 90 | 3265 | 18.2 | 79 |
AMC Concord DL | 18.1 | 6 | 258 | 120 | 3410 | 15.1 | 78 |
matta.parcoords(dataframe=df)
df = pd.read_csv('https://www.jasondavies.com/parallel-sets/titanic.csv')
df.head()
Class | Age | Sex | Survived | |
---|---|---|---|---|
0 | Second Class | Child | Female | Survived |
1 | Second Class | Child | Female | Survived |
2 | Second Class | Child | Female | Survived |
3 | Second Class | Child | Female | Survived |
4 | Second Class | Child | Female | Survived |
matta.parsets(dataframe=df, columns=['Survived', 'Sex', 'Age', 'Class'])
Graphs from networkx.DiGraph
are visualized using the Force Layout in d3.js.
graph = nx.davis_southern_women_graph()
for node in graph.nodes_iter(data=True):
graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'
graph.node[node[0]]['size'] = graph.degree(node[0])
matta.force(graph=graph, link_distance=100, height=600,
node_ratio='size',
node_color={'value': 'bipartite', 'scale': 'ordinal', 'palette': 'Set2'})