A proof of concept demonstration inspired by the Template export option provided by OpenRefine.
An export format template is defined in five parts in a single gist:
prefix.txt
- the header part of the template; assumed fixed in this implementationrowtemplate.txt
- the row template; applied to each row in turn;rowseparator.txt
- the row separator, separating each templated row output; assumed fixed in this implementationsuffix.txt
- the footer part of the template; assumed fixed in this implementationcolNames.json
- a file that identifies the column names used in the template and a description of themAn example gist can be found here: https://gist.github.com/psychemedia/ddb4b862fc65be2aeb3b
This implementation demonstrates how to apply the template table export formatter to a Python/pandas dataframe.
import pandas as pd
import requests, pystache, json
Define a function to download the gist and extract the files into a dict.
def getTemplateFiles(gist):
ffs={}
if (gist.startswith('https://api.github.com/gists/')) is False: gist='https://api.github.com/gists/'+str(gist)
r = requests.get(gist)
return r.json()['files']
Define a function to display the colNames.json
file.
def showColNames(gist):
tfiles=getTemplateFiles(gist)
return json.loads(tfiles['colNames.json']['content'])
Create a demo data frame - the demo template creates a geojson file from a table.
df=pd.DataFrame({"Place":["London, UK","Cambridge, UK","Paris, France"],
"Longitude":[-0.1276597,0.124862,2.3521334],
"Latitude":[51.5072759, 52.2033051 , 48.8565056] })
df
Latitude | Longitude | Place | |
---|---|---|---|
0 | 51.507276 | -0.127660 | London, UK |
1 | 52.203305 | 0.124862 | Cambridge, UK |
2 | 48.856506 | 2.352133 | Paris, France |
3 rows × 3 columns
Check to see what column names are expected and a description of the roel each column takes. Maybe the json file should include additional information, such as the expected type of each column?
showColNames('ddb4b862fc65be2aeb3b')
{'Lat': 'the name of the latitude column', 'Place': 'the name of the location', 'Lon': 'the name of the longitude column'}
Define a function to apply the template. The function expects:
def parser(df,gist,renames):
df = df.rename(columns=renames)
txt=''
tfiles=getTemplateFiles(gist)
#Currently blocks are separated by a \n
#Should we also define a common or separate block separators?
#If so, should separators be in separate files or a single JSON file?
txt=tfiles['prefix.txt']['content']+'\n'
rt=tfiles['rowtemplate.txt']['content']
rows=[]
#Should we try to indent every line of text with an additional tab here?
for i in range(len(df)):
cells=dict(df.ix[i])
rows.append(pystache.render(rt, cells))
txt=txt+'\n'+tfiles['rowseparator.txt']['content'].join(rows)
txt=txt+'\n'+tfiles['suffix.txt']['content']
return txt
print(parser(df,'ddb4b862fc65be2aeb3b',{'Latitude': 'Lat', 'Longitude': 'Lon'}))
{"features": [ {"geometry": { "coordinates": [ -0.1276597, 51.5072759 ], "type": "Point"}, "id": "London, UK", "properties": {}, "type": "Feature" },{"geometry": { "coordinates": [ 0.124862, 52.2033051 ], "type": "Point"}, "id": "Cambridge, UK", "properties": {}, "type": "Feature" },{"geometry": { "coordinates": [ 2.3521334, 48.8565056 ], "type": "Point"}, "id": "Paris, France", "properties": {}, "type": "Feature" } ], "type": "FeatureCollection"}
df
Latitude | Longitude | Place | |
---|---|---|---|
0 | 51.507276 | -0.127660 | London, UK |
1 | 52.203305 | 0.124862 | Cambridge, UK |
2 | 48.856506 | 2.352133 | Paris, France |
3 rows × 3 columns