import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session
import pyrasterframes.rf_ipython # enables nicer visualizations of pandas DF
from pyrasterframes.rasterfunctions import *
import pyspark.sql.functions as F
spark = create_rf_spark_session()
bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash) bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash) WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 21/09/30 03:19:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Read a single scene of elevation into DataFrame or raster tiles.
Each tile overlaps its neighbor by "buffer_size" of pixels, providing focal operations neighbor information around tile edges.
You can configure the default size of these tiles, by passing a tuple of desired columns and rows as: raster(uri, tile_dimensions=(96, 96))
. The default is (256, 256)
uri = 'https://geotrellis-demo.s3.us-east-1.amazonaws.com/cogs/harrisburg-pa/elevation.tif'
df = spark.read.raster(uri, tile_dimensions=(512, 512), buffer_size=2)
df.printSchema()
root |-- proj_raster_path: string (nullable = false) |-- proj_raster: struct (nullable = true) | |-- tile: tile (nullable = true) | |-- extent: struct (nullable = true) | | |-- xmin: double (nullable = false) | | |-- ymin: double (nullable = false) | | |-- xmax: double (nullable = false) | | |-- ymax: double (nullable = false) | |-- crs: crs (nullable = true)
The extent struct tells us where in the CRS the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.
df.count()
81
Additional transformations are complished through use of column functions. The functions used here are mapped to their Scala implementation and applied per row. For each row the source elevation data is fetched only once before it's used as input.
df.select(
rf_crs(df.proj_raster),
rf_extent(df.proj_raster),
rf_aspect(df.proj_raster),
rf_slope(df.proj_raster, z_factor=1),
rf_hillshade(df.proj_raster, azimuth=315, altitude=45, z_factor=1))
rf_crs(proj_raster) | rf_extent(proj_raster) | rf_aspect(proj_raster) | rf_slope(proj_raster, 1) | rf_hillshade(proj_raster, 315, 45, 1) |
---|---|---|---|---|
utm-CS | {240929.2154, 4398599.0319, 256289.2154, 4401599.0319} | |||
utm-CS | {210209.2154, 4432319.0319, 225569.2154, 4447679.0319} | |||
utm-CS | {256289.2154, 4416959.0319, 271649.2154, 4432319.0319} | |||
utm-CS | {271649.2154, 4509119.0319, 287009.2154, 4524479.0319} | |||
utm-CS | {333089.2154, 4398599.0319, 341969.2154, 4401599.0319} |