Focal Operations with RastrFrames Notebook¶

Setup Spark Environment¶

In [1]:

import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session
import pyrasterframes.rf_ipython  # enables nicer visualizations of pandas DF
from pyrasterframes.rasterfunctions import *
import pyspark.sql.functions as F

In [2]:

spark = create_rf_spark_session()

bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/09/30 03:19:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Get a PySpark DataFrame from elevation raster¶

Read a single scene of elevation into DataFrame or raster tiles. Each tile overlaps its neighbor by "buffer_size" of pixels, providing focal operations neighbor information around tile edges. You can configure the default size of these tiles, by passing a tuple of desired columns and rows as: raster(uri, tile_dimensions=(96, 96)). The default is (256, 256)

In [3]:

uri = 'https://geotrellis-demo.s3.us-east-1.amazonaws.com/cogs/harrisburg-pa/elevation.tif'
df = spark.read.raster(uri, tile_dimensions=(512, 512), buffer_size=2)

In [4]:

df.printSchema()

root
 |-- proj_raster_path: string (nullable = false)
 |-- proj_raster: struct (nullable = true)
 |    |-- tile: tile (nullable = true)
 |    |-- extent: struct (nullable = true)
 |    |    |-- xmin: double (nullable = false)
 |    |    |-- ymin: double (nullable = false)
 |    |    |-- xmax: double (nullable = false)
 |    |    |-- ymax: double (nullable = false)
 |    |-- crs: crs (nullable = true)

The extent struct tells us where in the CRS the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.

In [5]:

df.count()

Out[5]:

Focal Operations¶

Additional transformations are complished through use of column functions. The functions used here are mapped to their Scala implementation and applied per row. For each row the source elevation data is fetched only once before it's used as input.

In [6]:

df.select(
    rf_crs(df.proj_raster), 
    rf_extent(df.proj_raster), 
    rf_aspect(df.proj_raster), 
    rf_slope(df.proj_raster, z_factor=1), 
    rf_hillshade(df.proj_raster, azimuth=315, altitude=45, z_factor=1))

Out[6]:

Showing only top 5 rows
rf_crs(proj_raster)	rf_extent(proj_raster)	rf_aspect(proj_raster)	rf_slope(proj_raster, 1)	rf_hillshade(proj_raster, 315, 45, 1)
utm-CS	{240929.2154, 4398599.0319, 256289.2154, 4401599.0319}
utm-CS	{210209.2154, 4432319.0319, 225569.2154, 4447679.0319}
utm-CS	{256289.2154, 4416959.0319, 271649.2154, 4432319.0319}
utm-CS	{271649.2154, 4509119.0319, 287009.2154, 4524479.0319}
utm-CS	{333089.2154, 4398599.0319, 341969.2154, 4401599.0319}