We have a complex KML file with 2000+ individual layers. Each layer contains only a single feature but having large number of layers means it takes a long time to read and parse them. We can use GeoPandas to read all the layers and merge layers having the same geometry and write the cleaned layers to a GeoPacjage.
The following blocks of code will install and import the required packages in Colab environment.
try:
import geopandas
except ModuleNotFoundError:
if 'google.colab' in str(get_ipython()):
!apt install -qq libspatialindex-dev
!pip install fiona shapely pyproj rtree --quiet
!pip install geopandas --quiet
else:
print('geopandas not found, please install via conda in your environment')
import os
import pandas as pd
import geopandas as gpd
import fiona
Create a list of layers as a Pandas Series.
file_path = 'input.kml'
layers = pd.Series(fiona.listlayers(file_path))
layers
0 Text [7193A] 1 Point [71939] 2 Text [71937] 3 Point [71936] 4 Text [71934] ... 2177 MText [70BD3] 2178 MText [70BD2] 2179 MText [70BD1] 2180 Polyline [70BD0] 2181 Untitled layer Length: 2182, dtype: object
Iterate through each layer and read it using GeoPandas. Create a list of GeoDataFrames for each layer.
This step can take time.
Use tqdm
to display a progress bar.
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
gdf_list = []
from tqdm.notebook import tqdm
for index, layer in tqdm(layers.items(), total=len(layers)):
gdf = gpd.read_file(file_path, layer=layer)
gdf_list.append(gdf)
0%| | 0/2182 [00:00<?, ?it/s]
Merge the individual GeoDataFrames from each layer into a single GeoDataFrame.
merged = pd.concat(gdf_list)
merged.head()
Name | Description | geometry | |
---|---|---|---|
0 | B2-461 | POINT Z (78.50425 17.68199 0.00000) | |
0 | Point [71939]:0 | <table> <tr> <td align="right">Generic Propert... | MULTILINESTRING Z ((78.50424 17.68198 0.00000,... |
0 | B2-460 | POINT Z (78.49496 17.68535 0.00000) | |
0 | Point [71936]:0 | <table> <tr> <td align="right">Generic Propert... | MULTILINESTRING Z ((78.49496 17.68535 0.00000,... |
0 | B2-459 | POINT Z (78.49141 17.69321 0.00000) |
The geometry column includes different geometries, each geometry type has to be saved into a separate layers. Iterate over goemetry types and write out a layer in the output geopackage.
output_file = 'merged.gpkg'
for geomtype in merged.geom_type.unique():
merged[merged.geom_type == geomtype].to_file(output_file, driver='GPKG', layer=geomtype)