#!/usr/bin/env python # coding: utf-8 # # SpatiaLite Datasette Demo # # This notebook provides a quick walkthrough of getting started with a SpatiaLite geo-database and using it with Datasette. # ## Get the Data # # The SpatiaLite database can be used to store, index and perform various geo-related query operations on various geographical objects including points and shapefiles. # # To help me get up to speed, I'm going to try to load in a shapefile delimiting various bits of the UK into a SpatiaLite database, publish it as a datasette, retrieve a boundary for a particular region from the datasette API and then plot the boundary on a map. # # The shapefile I'm going to use is of UK administrative areas described by the Ordnance Survey Boundary-Line product. # # You can get a copy of the data from https://www.ordnancesurvey.co.uk/opendatadownload/products.html by selecting the *Boundary-Line* product and providing contact details, at which point you should be emailed a download link. # # Download the data and unzip it. This should create a folder named `bdline_essh_gb`, or something similar. (The paths used in this notebook assumes that it is running inside that directory.) # Inside the `bdline_essh_gb` ditectory is a `Data` subdirectory, and inside that is a `GB` subdirectory containing a variety of unpacked shapefiules, including bit not limited to: # # ``` # district_borough_unitary_region.dbf parish_region.shp # district_borough_unitary_region.prj parish_region.shx # district_borough_unitary_region.shp scotland_and_wales_const_region.dbf # district_borough_unitary_region.shx scotland_and_wales_const_region.prj # district_borough_unitary_ward_region.cpg scotland_and_wales_const_region.shp # district_borough_unitary_ward_region.dbf scotland_and_wales_const_region.shx # district_borough_unitary_ward_region.prj scotland_and_wales_region.dbf # district_borough_unitary_ward_region.sbn scotland_and_wales_region.prj # district_borough_unitary_ward_region.sbx scotland_and_wales_region.shp # district_borough_unitary_ward_region.shp scotland_and_wales_region.shx # district_borough_unitary_ward_region.shx # ``` # ## Loading Shapefile Data into SQLite # # The [datasette docs](http://datasette.readthedocs.io/en/latest/spatialite.html) currently suggest creating a SpatiaLite database using a command of the form `spatialite DATABASE.db` and then loading shapefiles into it using a SpatiaLite commandline command of the form: `.loadshp SHAPEFILE TABLENAME CP1252 23032`. # # That bit of voodoo actually unpacks a bit further as: # # `.loadshp PATH_AND_SHAPEFILE_NAME TABLENAME FILE_ENCODING [PROJECTION]` # # The `CP1252` file encoding is for the default Microsoft Windows `Latin-1` encoding, so I'm guessing that if your shapefile use another encoding you need to change that. (Internally, SpatiaLite uses UTF-8. `UTF-8` is also acceptable as a file encoding value in the above commandline command; a full list of acceptable values can be found by trying to load a shapefile using `spatialite_gui`.) The file encoding is a *required* argument, as are the path to the shapefile name and the table name. The projection is optional. # # The projection relates to the projection used within the shapefile. Setting this correctly allows you to transform the data to other projections, such as the *WGS84* (aka *EPSG:4326*, *GPS*, *latitude / longitude*) projection. # # The projection is identified using its SRID (*Spatial Reference System Identifier*) code ([lookup](http://spatialreference.org/)). If no code is provided, no projection is identified with the shapefile geodata in the database (it's give a -1 code). (I had expected the projection to be identified from the `.prj` (projection) file which contains a WKT description of the projection used in each `.shp` shapefile.) # # I didn't meet with much success looking for a Python library to identify the required SRID from the Ordnance Survey shapefiles, but did find an online API that seemed to do the trick: # In[100]: #Load in the shapefile projection details with open('./Data/GB/district_borough_unitary_region.prj', 'r') as myfile: prj=myfile.read() prj # In[113]: #Look up the correpsonding SRID import requests #http://prj2epsg.org/apidocs.html params = {'mode': 'wkt', 'terms':prj} r = requests.get('http://prj2epsg.org/search.json', params=params) #Use the first hit as the most likely SRID srid = r.json()['codes'][0] srid # This tells us that the co-ordinates used in the shapefile as given as OSGB Notrhings and Eastings. # # Use this code value (27700) to load out shapefile in with the correct projection code: # In[123]: # Load the data into a new SpatiaLite database get_ipython().system(' spatialite adminboundaries.db ".loadshp ./Data/GB/district_borough_unitary_region adminboundaries UTF-8 27700"') # The *Northings/Eastings* projection used by the OS shapefile is not directly plottable using many simple interactive web maps. Instead, they need GPS style latitude/longitude co-ordinates. Given that we know the projection of the original shapefile, we can create a new set of transformed co-ordinates using the required lat/long (WGS84) projection. # # Let's create a simple text file containing a SQL script to handle that transformation for us: # In[124]: get_ipython().run_cell_magic('bash', '', "echo '''\nBEGIN;\nALTER TABLE adminboundaries ADD COLUMN wgs84 BLOB;\nUPDATE adminboundaries SET wgs84 = Transform(Geometry, 4326);\nCOMMIT;\n''' > project2wsg84.sql\n") # In[125]: get_ipython().system(' cat project2wsg84.sql') # We can now read and execute that script against our database (once again, the file encoding appears to be required?) # In[127]: get_ipython().system(' spatialite adminboundaries.db ".read project2wsg84.sql utf-8"') # ## Accessing the SpatiaLite database Using Datasette # # Having created our database and generated a projection approropriate for plotting using an interactive web map, let's publish it as a datasette and then see if we can access the data from the datasette API. # # Here's a trick I just learned via a tweet from *@minrk* for running a simple web service from within a Jupyter notebook code cell without blocking the notebook execution: # In[4]: # Example of running datasette server from inside a code cell # via https://nbviewer.jupyter.org/gist/minrk/622080cf8af787734805d12bec4ae302from threading import Thread from threading import Thread def app_in_thread(): get_ipython().system(' datasette adminboundaries.db --load-extension=/usr/local/lib/mod_spatialite.dylib --config sql_time_limit_ms:10000 --cors') t = Thread(target=app_in_thread) t.start() # Alternative mutlitasking package # https://github.com/micahscopes/nbmultitask # We should now be able to access the datasette API. # In[24]: #What region shall we search for a boundary for? #How about the diamond isle just off the south coast... region='wight' # In[25]: import requests params = {'sql': 'SELECT AsGeoJSON(wgs84) FROM adminboundaries WHERE name LIKE "%{}%"'.format(region)} json_url = "http://localhost:8001/adminboundaries.json" r = requests.get(json_url, params=params) results = r.json() # In[26]: # Get geojson feed served from datasette - the result is actually a string # so convert the string to actual json, qua py dict import json geojson=json.loads(results['rows'][0][0]) # Now let's see if we can render that region as a map. # In[27]: #folium is a handy package I've used for ages for displaying maps in Jupyer notebooks import folium # In[28]: # Get example point from geojson to help center the map lat,lng = tuple(geojson['coordinates'][0][0][0]) # In[29]: #Render the map m = folium.Map(location=(lng,lat), zoom_start=10) m.choropleth(geo_data=geojson, line_color='blue',line_weight=3) m # So that all seems to work... :-)