telluric

Interactive vector and raster geospatial data manipulation in Python

Guy Doulberg - 2018-06-04 @ PyCon-IL

https://github.com/satellogic/telluric-talks

telluric

Satellogic

  • We are building a constellation of satellites that orbits earth.
  • We are building an infrastructure to collect and store data (mostly images) from this constellation.
  • We are producing insights out of this data and other sources of data.
  • We are hiring

satelloguc

Guy

  • I am part of the team which is responsilbe for collecting storing and accesing the Raster and Vector Data in Satellogic.
  • Software engineer with experience in big data projects.

1. Quick intro to geospatial data

Geospatial data, geographic data, georeferenced data, or just geodata "are defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to the Earth"

-- https://en.wikipedia.org/wiki/Geographic_data_and_information

A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data.

-- https://en.wikipedia.org/wiki/Geographic_information_system

Two forms: vector and raster

  • Vector data:
    • Defining shapes
  • Raster data:
    • Assigning pixels values

Vector vs Raster

Image by Víctor Olaya "Sistemas de Información Geográfica", CC-BY

Vector data

  • Entities: Points, Lines and Polygons
  • Common formats: ESRI Shapefiles, GeoJSON, TopoJSON, ...

Vector data

Image by M. W. Toews https://en.wikipedia.org/wiki/File:Simple_vector_map.svg, CC-BY

Raster data

  • Common formats: GeoTIFF, JPEG2000, SRTM3, ...

Raster data

GEO referenced

  • Vectors and Rasters can exist in fields other than GIS.
  • The are part of a GIS when they are Geo-Referenced.
  • For example:
    • A Polygon is Geo-Referenced when you know how to draw it on earth
    • A Raster is Geo-Referenced when you know where is the raster located

Projections

A method to locate points on earth

TL;DR: A mess.

  • The Earth is approximately a sphere, which is not a developable surface (i.e. it cannot be flattened onto a plane without distortion)
  • Therefore, we need projections that attempt to represent the Earth on a plane, with various tradeoffs and limitations

Projections

2. Why yet another Existing software

The landscape is complicated and somewhat difficult to navigate at times:

Landscape

Summary of key libraries:

Python C/C++
pyproj PROJ.4
Fiona OGR
Shapely GEOS
rasterio GDAL

3. telluric library

telluric is an open source (MIT) Python library to manage vector and raster geospatial data in an interactive and easy way.

Importing for interactive use is short:

In [1]:
import telluric as tl
from telluric.constants import WGS84_CRS, WEB_MERCATOR_CRS

Basic geometry definition using GeoVector: in the simplest case, the bounds and the projection (CRS)

In [2]:
gush_dan = tl.GeoVector.from_bounds(
    xmin=34.6, ymin=32, xmax=35.0, ymax=32.3, crs=WGS84_CRS
)
print(gush_dan)
GeoVector(shape=POLYGON ((34.6 32, 34.6 32.3, 35 32.3, 35 32, 34.6 32)), crs=CRS({'init': 'epsg:4326'}))

We added extra attention to Jupyter noteobook visualization, checkout this plot

In [3]:
gush_dan
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[3]:

Using Shapely geometries is also allowed:

In [4]:
from shapely.geometry import Polygon

south_tlv = tl.GeoVector(
    Polygon([(34.75, 32.04), (34.75, 32.06), (34.77,32.06), (34.77, 32.04), ( 34.75, 32.04)]),
    WGS84_CRS
)
south_tlv
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[4]:

Access several properties:

In [5]:
gush_dan.centroid
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[5]:
In [6]:
gush_dan.area  # Real area in square meters
Out[6]:
1259014132.9260154

Geomteric Relations

In [7]:
gush_dan.within(south_tlv)
Out[7]:
False
In [8]:
south_tlv.within(gush_dan)
Out[8]:
True
In [9]:
print(south_tlv.difference(gush_dan))
GeoVector(shape=GEOMETRYCOLLECTION EMPTY, crs=CRS({'init': 'epsg:4326'}))
In [10]:
gush_dan.difference(south_tlv)
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[10]:
  • Features: GeoVector + attributes
  • FeatureCollections: a sequence of features
In [11]:
gf1 = tl.GeoFeature(
    gush_dan,
    {'district': 'גוש דן'}
)
gf2 = tl.GeoFeature(
    south_tlv,
    {'city': 'תל אביב'}
)
print(gf1)
print(gf2)
GeoFeature(Polygon, {'district': 'גוש דן'})
GeoFeature(Polygon, {'city': 'תל אביב'})
In [12]:
fc = tl.FeatureCollection([gf1, gf2])
fc
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[12]:

Raster data:

In [13]:
# This will only save the URL in memory
rs = tl.GeoRaster2.open(
    "https://github.com/mapbox/rasterio/raw/master/tests/data/rgb_deflate.tif"
)

# These calls will fecth some GeoTIFF metadata
# without reading the whole image
print(rs.crs)
print(rs.band_names)
CRS({'init': 'epsg:32618'})
[0, 1, 2]
In [14]:
rs.footprint() # is the bouding box
/home/guyd/pEnvs/telluric-dev/local/lib/python3.5/site-packages/telluric/plotting.py:141: UserWarning: Plotting a limited representation of the data, use the .plot() method for further customization
  "Plotting a limited representation of the data, use the .plot() method for further customization")
Out[14]:
In [15]:
rs
Out[15]:
In [29]:
rs.image[:,200:300, 200:240]
Out[29]:
masked_array(
  data=[[[15, 20, 23, ..., 41, 38, 29],
         [17, 15, 23, ..., 41, 46, 46],
         [18, 17, 17, ..., 39, 39, 44],
         ...,
         [106, 49, 62, ..., 46, 58, 50],
         [255, 255, 255, ..., 38, 37, 43],
         [255, 229, 255, ..., 40, 31, 35]],

        [[94, 98, 100, ..., 129, 98, 62],
         [98, 96, 100, ..., 127, 127, 133],
         [98, 96, 94, ..., 126, 126, 129],
         ...,
         [145, 79, 116, ..., 47, 58, 51],
         [255, 255, 255, ..., 43, 41, 45],
         [255, 249, 255, ..., 41, 33, 37]],

        [[131, 131, 131, ..., 161, 125, 86],
         [133, 133, 135, ..., 164, 160, 161],
         [133, 132, 129, ..., 159, 160, 157],
         ...,
         [141, 77, 108, ..., 29, 34, 31],
         [255, 255, 255, ..., 31, 30, 31],
         [255, 255, 255, ..., 23, 19, 27]]],
  mask=[[[False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         ...,
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False]],

        [[False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         ...,
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False]],

        [[False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         ...,
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False]]],
  fill_value=999999,
  dtype=uint8)
In [16]:
# Crop fetch only the data required to build the raster
rs.crop(rs.footprint().buffer(-50000))
Out[16]: