Proof of Concept - Templated Table Export Formatter¶

A proof of concept demonstration inspired by the Template export option provided by OpenRefine.

An export format template is defined in five parts in a single gist:

prefix.txt - the header part of the template; assumed fixed in this implementation
rowtemplate.txt - the row template; applied to each row in turn;
rowseparator.txt - the row separator, separating each templated row output; assumed fixed in this implementation
suffix.txt - the footer part of the template; assumed fixed in this implementation
colNames.json - a file that identifies the column names used in the template and a description of them

An example gist can be found here: https://gist.github.com/psychemedia/ddb4b862fc65be2aeb3b

This implementation demonstrates how to apply the template table export formatter to a Python/pandas dataframe.

In [176]:

import pandas as pd
import requests, pystache, json

Define a function to download the gist and extract the files into a dict.

In [185]:

def getTemplateFiles(gist):
    ffs={}
    if (gist.startswith('https://api.github.com/gists/')) is False: gist='https://api.github.com/gists/'+str(gist)
    r = requests.get(gist)
    return r.json()['files']

Define a function to display the colNames.json file.

In [179]:

def showColNames(gist):
    tfiles=getTemplateFiles(gist)
    return json.loads(tfiles['colNames.json']['content'])

Create a demo data frame - the demo template creates a geojson file from a table.

In [181]:

df=pd.DataFrame({"Place":["London, UK","Cambridge, UK","Paris, France"],
                "Longitude":[-0.1276597,0.124862,2.3521334],
                "Latitude":[51.5072759, 52.2033051 , 48.8565056] })
df

Out[181]:

	Latitude	Longitude	Place
0	51.507276	-0.127660	London, UK
1	52.203305	0.124862	Cambridge, UK
2	48.856506	2.352133	Paris, France

3 rows × 3 columns

Check to see what column names are expected and a description of the roel each column takes. Maybe the json file should include additional information, such as the expected type of each column?

In [180]:

showColNames('ddb4b862fc65be2aeb3b')

Out[180]:

{'Lat': 'the name of the latitude column',
 'Place': 'the name of the location',
 'Lon': 'the name of the longitude column'}

Define a function to apply the template. The function expects:

a pandas dataframe;
a gist URL or identifier;
a set of column renames to locally change current column names to the expected name used in the template, if necessary.

In [182]:

def parser(df,gist,renames):
    df = df.rename(columns=renames)
    txt=''
    tfiles=getTemplateFiles(gist)
    
    #Currently blocks are separated by a \n
    #Should we also define a common or separate block separators?
    #If so, should separators be in separate files or a single JSON file?
    txt=tfiles['prefix.txt']['content']+'\n'
    rt=tfiles['rowtemplate.txt']['content']
    rows=[]
    #Should we try to indent every line of text with an additional tab here?
    for i in range(len(df)):
        cells=dict(df.ix[i])
        rows.append(pystache.render(rt, cells))
    txt=txt+'\n'+tfiles['rowseparator.txt']['content'].join(rows)
    txt=txt+'\n'+tfiles['suffix.txt']['content']
    return txt

In [183]:

print(parser(df,'ddb4b862fc65be2aeb3b',{'Latitude': 'Lat', 'Longitude': 'Lon'}))

{"features": [

{"geometry": 
        {   "coordinates": [ -0.1276597,
                51.5072759
            ],
            "type": "Point"},
            "id": "London, UK",
            "properties": {}, "type": "Feature"
    },{"geometry": 
        {   "coordinates": [ 0.124862,
                52.2033051
            ],
            "type": "Point"},
            "id": "Cambridge, UK",
            "properties": {}, "type": "Feature"
    },{"geometry": 
        {   "coordinates": [ 2.3521334,
                48.8565056
            ],
            "type": "Point"},
            "id": "Paris, France",
            "properties": {}, "type": "Feature"
    }
], "type": "FeatureCollection"}

In [184]:

df

Out[184]:

	Latitude	Longitude	Place
0	51.507276	-0.127660	London, UK
1	52.203305	0.124862	Cambridge, UK
2	48.856506	2.352133	Paris, France

3 rows × 3 columns

In [ ]: