Purpose of Notebook: to display our injury prevention partner programs on an asset map, each partner program will fill out our Survey123 form, which collects information on the program's services and service area boundaries. When a program's survey entry is submitted to our Survey123 form, it is collecting information for both a point (the headquarters) and a polygon (the service area) attached to it to be displayed in the final map. Currently, the only geometry that is drawn when a survey entry is submitted is the service area polygon if the participant draws their own service area boundary.
Many of the service area boundaries align with zip codes or the city's boundary, but the Survey123 form does not automatically create these geometries when the service area boundary question is filled out. Therefore, corresponding geometries are added after the survey entry is submitted.
The code will update the geometry for each row depending on the populated fields; e.g., if the service area boundary is citywide, that row/program will now automatically match the citywide boundary using dissolved zip codes. For programs that have multiple zip codes selected, the code will list these zip codes, match them to the zip codes layer based on zip code names, merge these zip codes, and dissolve them into one boundary around all these zip codes for the survey output layer. If a program chooses to display police districts, the same will be done but for police districts in those cases.
The code will also parse and geocode addresses using the Address Information System (AIS) and create a separate point layer with globalIDs for each feature that match their corresponding polygon feature. This code would be manually run by an analyst after each phase of data collection; once the team receives a sufficient number of survey responses on Survey123, this code should be run to incorporate all features in the Survey123 response layer.
# import libraries
import pandas as pd
import numpy as np
import requests
import sys
import traceback
import usaddress
import datetime
from passyunk.parser import PassyunkParser
import re
import os
import arcgis
import arcpy
The output from the code block below is used to track who last edited the Notebook and when it was last updated.
# run only when you edit the Notebook
from IPython.display import display, Markdown
from datetime import date
today = str(date.today()) # get date
currentuser = os.environ.get("USERNAME") # get username of person updating - must be signed in
display(Markdown("#### Author: Evelyn Gorey <br> Last run by " + currentuser + " on " + today))
# set Pro project
aprx = arcpy.mp.ArcGISProject("CURRENT")
mp = aprx.listMaps('Map')[0] # first map in project
# create a variable that represents the default file geodatabase
fgdb = r'' # set path
aprx.defaultGeodatabase = fgdb
arcpy.env.workspace = fgdb
arcpy.env.overwriteOutput=True
Connect to ArcGIS Online and set your proxy.
from arcgis.gis import GIS
proxy = {
'http': 'proxy.phila.gov:8080/',
'https': 'proxy.phila.gov:8080/',
}
gis = GIS(proxy=proxy)
Add the Survey123 feature to the Contents pane.
# define function to get the programs layers from AGO
def getprogramlayers():
# get tracts layer from AGO (from layer group) to contents pane and keeps name of AGO feature
mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/service_6ba2cfee114148b286a8305b50073904/FeatureServer/0')
# get 2nd polygon (repeat) from AGO (from same layer group)
mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/service_6ba2cfee114148b286a8305b50073904/FeatureServer/1')
# create layer list of resulting features in contents pane (automatically named after how they are in AGO)
agolyrs = ['Asset_Map_Survey_Test', 'rptPoly2']
# iterate over the layers in the list to export layers to fgdb
for lyr in agolyrs:
# assign the base names for the output feature classes
base_name = "asset_programs" if "Asset_Map_Survey_Test" in lyr else "asset_programs_poly2"
# export
arcpy.conversion.ExportFeatures(
in_features=lyr,
out_features=os.path.join(fgdb, base_name)
)
# delete intermediates from contents pane and geodatabase
arcpy.management.Delete(["rptPoly2", "Asset_Map_Survey_Test"])
# execute function
getprogramlayers()
Add the necessary City boundary layers (police/PPD districts and zip codes) to the Contents pane. Create a citywide boundary layer from the zip codes by dissolving them.
# define function to get the boundary layers from AGO
def getboundarylayers():
# add zip code AGO layer (from maps.phl) to contents pane and keeps name of AGO feature
mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Zipcodes_Poly/FeatureServer/0') # 0 is first polygon, 1 is second
# add PPD districts AGO layer (from maps.phl) to contents pane
mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Boundaries_District/FeatureServer/0')
# create list of features to copy to geodatabase
boundarylyrs = ["GIS_STREETS.Zipcodes_Poly", "Police Districts"]
# iterate over the layers in the list to copy features to the file geodatabase with specified names
for lyr in boundarylyrs:
# assign the base names for the output feature classes
base_name = "zips_poly" if "GIS_STREETS.Zipcodes_Poly" in lyr else "ppd_poly"
# copy
arcpy.management.CopyFeatures(
in_features=lyr,
out_feature_class=os.path.join(fgdb, base_name)
)
# create citywide layer (dissolved output of zipcodes)
arcpy.management.Dissolve("zips_poly", os.path.join(fgdb, "citywide"))
# remove unnecessary intermediate layers from contents pane
arcpy.management.Delete(["GIS_STREETS.Zipcodes_Poly", "Police Districts"])
# execute function
getboundarylayers()
I found that the easiest way to clean up the survey ouput ArcGIS Online layer is to convert it to a spatial dataframe first.
# bring in layer as spatial dataframe for easier field manipulation
sdf = pd.DataFrame.spatial.from_featureclass("asset_programs")
Clean columns, only keep what is necessary, and fill in any missing cell values to make it easier to filter/apply geometry edits in the future.
# pipe all data cleaning steps
sdf = (sdf
# add columns - if column != y, change to 'n' (inconsistency due to changing survey since beginning)
.assign(citywide=np.where(sdf['citywide'] == 'y', 'y', 'n'),
poly2=np.where(sdf['poly2'] == 'y', 'y', 'n'),
# if boundary geometry type preference isn't set, default to zip code
zip_ppd=np.where(sdf['zip_ppd'] == 'ppd', 'ppd', 'zip'),
# if citywide = y, populate with all possible zip codes
zip_name=np.where((sdf['citywide'] == 'y'),
['19102,19103,19104,19106,19107,19109,19111,19112,19114,19115,19116,19118,19119,19120,19121,19122,19123,19124,19125,19126,19127,19128,19129,19130,19131,19132,19133,19134,19135,19136,19137,19138,19139,19140,19141,19142,19143,19144,19145,19146,19147,19148,19149,19150,19151,19152,19153,19154'],
sdf['zip_name']),
# if citywide = y, populate with all possible PPD districts
ppd_name=np.where((sdf['citywide'] == 'y'),
['1,3,17,2,5,6,7,8,77,39,35,26,25,24,22,19,18,16,15,14,12,9'],
sdf['ppd_name']),
# create a column to indicate when program's service area was drawn by the respondent -> that boundary should be what shows on the map
self_drawn_bound=np.where(pd.isna(sdf['SHAPE']), 'n', 'y'))
# rename columns to be more descriptive
.rename(columns={'zip_name': 'zip_service', 'ppd_name': 'ppd_service'})
# delete unnecessary columns
.drop(columns=['list_contact', 'name', 'CreationDate', 'Creator',
'EditDate', 'Editor', 'district_boundary', 'list_capacity',
'services_length', 'services_length_year', 'services_length_month',
'description', 'ppd_yesno', 'prog_description', 'feedback',
'rptPoly2_count'])
)
The next code block cleans the address field and parses it into 3 fields - street address, city, and zip code. The City's passyunk parser, which was developed by the Office of Innovation & Technology (OIT), is used to create the street address column.
# set parser
p = PassyunkParser()
## 1. Create the functions to extract parts of the address column - could probably condense these into one function
# street address - uses passyunk
def extract_street(address):
parsed = p.parse(address)
if 'components' in parsed:
return re.sub(r'\w+', lambda m:m.group(0).capitalize(), # title case
parsed['components']['output_address'])
else:
return address
# city name - uses usaddress
def extract_city(address):
parsed = usaddress.tag(address)[0]
if 'PlaceName' in parsed:
return parsed['PlaceName']
else:
return address
# zip code - uses usaddress
def extract_zip(address):
parsed = usaddress.tag(address)[0]
if 'ZipCode' in parsed:
return parsed['ZipCode']
else:
return address
## 2. Add and populate the new columns for each
sdf = (sdf
.assign(address_street = sdf['address'].apply(extract_street),
address_city = sdf['address'].apply(extract_city),
address_zipcode = sdf['address'].apply(extract_zip))
)
Reorder the columns for better organization and preview all changes.
# reorder the columns
sdf = sdf[['OBJECTID', #'globalid',
'program_name', 'program_type', 'progtype_other',
'services', 'services_other', 'referral_services', 'referral_services_other',
'self_refer', 'capacity', 'service_length', 'desc_serviceprov', 'desc_spservices',
'desc_eligibility', 'comments', 'email', 'phone', 'website', 'address', 'address_street',
'address_city', 'address_zipcode','zip_service', 'ppd_service', 'citywide',
'zip_ppd', 'self_drawn_bound', 'poly2', 'SHAPE']]
# preview all changes
sdf
The next code block exports the layer to a feature class so that the addresses can be geocoded and the empty geometries can be filled using arcpy.
# export sdf to feature class - overwrite existing
sdf.spatial.to_featureclass(location=os.path.join(fgdb, "asset_programs"))
When a program respondent takes the survey, the geometry section asks if the program operates citywide and if not, the respondent is asked to list the zip codes the program serves. Zip code is required for non-citywide programs, but there is another optional question to additionally provide the PPD districts the program serves.
Many of the programs have opted to list PPD districts as well as zip codes, but for the ones that did not, it would be cleaner and more standardized in the Finder's information card for each program to display PPD districts instead of just those who opted to provide that information.
To achieve this, we can use our zipcode and PPD district feature layers to see which PPD districts intersect the zip codes listed for each program. Since the geometries are a bit off from one another (e.g., their boundary lines in the middle of the Schuylkill River are slightly different, which could falsely return a PPD district that inaccurately intersects a tiny fraction of that zip code), for each PPD district that intersects a zip code, we can calculate the percentage of that PPD district that intersects with the specified zip code. If it makes up less than 5% of the zip code, it will not be returned.
# define function
def matchppdstr(programlyr, ppddists, zipcodes, progzipfld, progppdfld, ppdfld,
zipfld):
# add a new PPD district ID field in the PPD district layer to match programs layer
arcpy.AddField_management(ppddists, ppdfld, "TEXT")
# remove leading zeros in the new PPD district ID field to match programs layer
with arcpy.da.UpdateCursor(ppddists, [ppdfld, "DIST_NUMC"]) as cursor:
for row in cursor:
row[0] = row[1].lstrip("0")
cursor.updateRow(row)
# create dictionaries for police districts and zip codes - IDs and geometries
districts_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(ppddists, [ppdfld, "SHAPE@"])}
zipcodes_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(zipcodes, [zipfld, "SHAPE@"])}
# define function to get intersecting districts for a given set of zip codes
def get_intersecting_districts(zip_codes_str):
# initialize set to prevent ID from being added multiple times per row
intersecting_districts = set()
# split string field
zip_codes_list = zip_codes_str.split(',')
# define geometry part of dictionary and calculate intersection of zips and PPD dists
for zip_code_id in zip_codes_list:
zip_code_geom = zipcodes_dict[zip_code_id]
for district_id, district_geom in districts_dict.items():
# specifies arcpy geometry relationship rule: 4 = intersection
intersection_geom = district_geom.intersect(zip_code_geom, 4)
# check if the area is not zero
if not intersection_geom.area == 0:
# get the % in which the PPD district overlaps the zip code
ratio = intersection_geom.area / zip_code_geom.area
# return only districts that make up >= 5% of total intersecting zip code
if ratio >= 0.05:
intersecting_districts.add(district_id)
return intersecting_districts
# update police districts field ONLY if it isn't already filled out
where_clause = f"{progzipfld} IS NOT NULL AND ({progppdfld} IS NULL OR {progppdfld} = '')"
# start list at 0, each PPD district that matches where_clause increases it by 1
selected_rows_count = 0
# update cursor / rows
with arcpy.da.UpdateCursor(programlyr, [progzipfld, progppdfld], where_clause) as cursor:
for row in cursor:
# each row that meets where_clause criteria increases counter by 1
selected_rows_count += 1
# add corresponding ID for each row that meets criteria
zip_codes_str = row[0]
# applies our function to return intersecting PPD districts
intersecting_districts = get_intersecting_districts(zip_codes_str)
# update the ppd_service field with the comma-sep list of matching districts
row[1] = ",".join(intersecting_districts)
cursor.updateRow(row)
# return number of rows processed - check if correct
print(f"Selected Rows: {selected_rows_count}")
# execute function
matchppdstr("asset_programs", "ppd_poly", "zips_poly", "zip_service", "ppd_service",
"ppd_id_prog", "CODE")
We want consistent schema each time we run this notebook/export the layers to AGO. To achieve this, we can write some code to mandate that the field names and lengths are what we specify, as currently, the fields are all different lengths.
Because there is no known way using arcpy to edit the lengths of existing fields, we are going to use a workaround to export (copy) the feature with copied contents of each field and use field mapping to set the new field lengths on the copy.
Future work can include finding a way to edit existing field lengths or add new fields of the proper lengths within the original layer.
# define function
def set_field_lengths(programs, field_lengths):
# create a variable of the programs layer name
shapefile_name = arcpy.Describe(programs).name
# create a copy of the programs layer
temp_feature_class = os.path.join(fgdb, shapefile_name + "_temp")
# create a field mapping object
field_mappings = arcpy.FieldMappings()
# loop through each field and create a new field with the specified length in the field mapping
for field_name, field_length in field_lengths.items():
try:
# Create a new field mapping
field_map = arcpy.FieldMap()
field_map.addInputField(programs, field_name)
# Create a new field with the specified length in the field mapping
new_field = arcpy.Field()
new_field.name = field_name
new_field.type = field_map.outputField.type
new_field.length = field_length
field_map.outputField = new_field
# Add the field mapping to the field mappings object
field_mappings.addFieldMap(field_map)
except arcpy.ExecuteError as e:
error_message = f"Error setting field length for '{field_name}': {e}"
print(error_message)
try:
# export features using field mapping
arcpy.conversion.ExportFeatures(programs, temp_feature_class, field_mapping=field_mappings)
# delete the original feature class
arcpy.management.Delete(os.path.join(fgdb, programs))
# rename the copied feature class to the original name - FIX:
arcpy.management.Rename(temp_feature_class, shapefile_name)
print("Field length changes completed successfully.")
except arcpy.ExecuteError as e:
error_message = f"Error exporting features with field mapping: {e}"
print(error_message)
# set path of programs feature
programs = "asset_programs"
# set field length dictionary
field_lengths = {'program_name': 75, 'program_type': 15, 'progtype_other': 50,
'services': 150, 'services_other': 255, 'referral_services': 150, 'referral_services_other': 255,
'self_refer': 13, 'capacity': 10, 'service_length': 6, 'desc_serviceprov': 255, 'desc_spservices': 255,
'desc_eligibility': 255, 'comments': 255, 'email': 100, 'phone': 12, 'website': 255, 'address': 100,
'address_street': 100, 'address_city': 25, 'address_zipcode': 5, 'zip_service': 300, 'ppd_service': 100,
'citywide': 1, 'zip_ppd': 10, 'self_drawn_bound': 1, 'poly2': 1}
# execute function
set_field_lengths(programs, field_lengths)
We can print the fields of the new layer to make sure everything looks good. We're specifically interested in the string fields since they're the only ones we changed.
# function to print existing fields
def print_fields(shapefile):
# Use ListFields to get a list of field objects
fields = arcpy.ListFields(shapefile)
# Print field information
for field in fields:
print(f"Field Name: {field.name}")
print(f" Type: {field.type}")
print(f" Length: {field.length}")
# execute function
print_fields("asset_programs")
Geocoding the address for the headquarters using AIS will add and populate X and Y coordinate fields to the layer we were just working with, but only for Philadelphia addresses. (AIS was used instead of other geocoders due to its accuracy and its lack of need to consume ArcGIS credits.) A separate point feature will then need to be created from these coordinates and checked for accuracy.
The code below was adapted from the Office of Licenses & Inspections (L&I) for this particular use. The code uses arcpy and the Philadelphia Address Information System (AIS), developed by OIT, to automatically parse each feature's address and add and populate coordinate fields representing the parcel address.
# select programs feature for input
input_table = os.path.join(fgdb, 'asset_programs')
# select input address field from programs feature
address = 'address'
# paste desired output fields here as python list (just need address and coordinates)
input_fields = ['street_address']
# select coordinate system for output coordinates: '2272' (NAD83 / Pennsylvania South (ftUS) ) OR '4326' (WGS 84 - Standard lat/lon coordinates)
coordinates = '4326'
# to get through proxy, you will need to enter your City username and password
user = 'evelyn.gorey'
password = ''
# you will need to specify your own unique AIS key
aiskey = ''
# geocode using AIS - code from L&I that was adapted to this particular use
# more on AIS here: https://github.com/CityOfPhiladelphia/ais
basestring = str
start_time = datetime.datetime.now()
print('Beginning AIS Geocode Tool')
print(start_time)
print(coordinates)
print(address)
print(input_fields)
# ensures address field is included in input_fields if not already
if 'street_address' not in input_fields:
input_fields.insert(0, 'street_address')
print(input_fields)
# returns number of rows/programs to geocode
countin = int(arcpy.GetCount_management(input_table).getOutput(0))
count = 0
print('Found {0} records in input table.'.format(countin))
#
breaks = [int(float(countin) * float(b) / 100.0) for b in range(10, 100, 10)]
lenList = []
infoDict = {}
addrCursor = arcpy.da.SearchCursor(input_table, address)
#addrCursor = ['6635 MCCALLUM ST'] # line for testing a single address
for addr in addrCursor:
if addr is not None:
try:
count += 1
if count in breaks:
print('Part 1 of 3: Parsing Input Addresses ' + str(int(round(count * 100.0 / countin))) + '% complete...')
# Part 1 -- US Address Parser
parsed = usaddress.tag(addr[0], tag_mapping={
'Recipient': 'recipient',
'AddressNumber': 'address1',
'AddressNumberPrefix': 'address1',
'AddressNumberSuffix': 'address1',
'StreetName': 'address1',
'StreetNamePreDirectional': 'address1',
'StreetNamePreModifier': 'address1',
'StreetNamePreType': 'address1',
'StreetNamePostDirectional': 'address1',
'StreetNamePostModifier': 'address1',
'StreetNamePostType': 'address1',
'CornerOf': 'address1',
'IntersectionSeparator': 'address1',
'LandmarkName': 'address1',
'USPSBoxGroupID': 'address1',
'USPSBoxGroupType': 'address1',
'USPSBoxID': 'address1',
'USPSBoxType': 'address1',
'BuildingName': 'address2',
'OccupancyType': 'address2',
'OccupancyIdentifier': 'address2',
'SubaddressIdentifier': 'address2',
'SubaddressType': 'address2',
'PlaceName': 'city',
'StateName': 'state',
'ZipCode': 'zip_code',
})
try:
address1 = parsed[0]['address1']
except:
address1 = None
try:
address2 = parsed[0]['address2']
except:
address2 = None
try:
zipcode = parsed[0]['zip_code']
except:
zipcode = None
try:
city = parsed[0]['city']
except:
city = None
try:
state = parsed[0]['state']
except:
state = None
# del parsed_address
paddress1 = {'address1': address1,
'address2': address2,
'zipcode': zipcode}
# Part 2 -- Passyunk Parser
p = PassyunkParser()
preP = paddress1['address1'] + ' ' + paddress1['address2'] if paddress1['address2'] is not None else paddress1['address1']
paddr = p.parse(preP)
# Part 3 -- AIS API
# if there is no unit, just find base address
if paddr['components']['address_unit']['unit_num'] is None:
proxyDict = {'http':'http://'+user+':' + password+'@proxy.phila.gov:8080', 'https': 'https://'+user+':' + password+'@proxy.phila.gov:8080'}
r = requests.get(
'http://api.phila.gov/ais/v1/search/' + paddr['components']['output_address'] + '?gatekeeperKey='+aiskey+'&srid='+coordinates, proxies=proxyDict)
aisString = r.json()
if 'error' not in aisString:
features = aisString['features']
properties = features[0]['properties']
addrInfo = [features[0]['match_type']] + [properties[f.lower()] for f in input_fields[:-1]]
geometry = features[0]['geometry']
addrInfo.extend([geometry['geocode_type'], geometry['type']]+[v for v in geometry['coordinates']])
infoDict[addr[0]] = addrInfo
if not lenList:
lenList = [len(properties[f.lower()]) if properties[f.lower()] is not None and isinstance(properties[f.lower()], basestring) else len('|'.join(properties[f.lower()])) if isinstance(properties[f.lower()], list) else 1 for f in
input_fields]
else:
tempLenList = [len(properties[f.lower()]) if properties[f.lower()] is not None and isinstance(properties[f.lower()], basestring) else len('|'.join(properties[f.lower()])) if isinstance(properties[f.lower()], list) else 1 for f in
input_fields]
for i, l in enumerate(lenList):
if tempLenList[i] > l:
lenList[i] = tempLenList[i]
del tempLenList
# if there is a unit, find unit information
else:
proxyDict = {'http':'http://'+user+':' + password+'@proxy.phila.gov:8080', 'https': 'https://'+user+':' + password+'@proxy.phila.gov:8080'}
r = requests.get(
'http://api.phila.gov/ais/v1/search/' + paddr['components']['output_address'] + '?include_units&gatekeeperKey=8bb0f304f401dc79fe0bded1eda25fc1&srid='+coordinates, proxies=proxyDict)
aisString = r.json()
if 'error' not in aisString:
features = aisString['features']
properties = features[0]['properties']
addrInfo = [features[0]['match_type']] + [properties[f.lower()] for f in input_fields[:-1]]
geometry = features[0]['geometry']
addrInfo.extend([geometry['geocode_type'], geometry['type']]+[v for v in geometry['coordinates']])
infoDict[addr[0]] = addrInfo
if not lenList:
lenList = [len(properties[f.lower()]) if properties[f.lower()] is not None else 1 for f in
input_fields]
else:
tempLenList = [len(properties[f.lower()]) if properties[f.lower()] is not None else 1 for f in
input_fields]
for i, l in enumerate(lenList):
if tempLenList[i] > l:
lenList[i] = tempLenList[i]
del tempLenList
except arcpy.ExecuteError:
print('Row Failure')
msgs = arcpy.GetMessages(2)
print(msgs)
print(msgs)
pass
except:
print(addr[0])
print('Row Failure')
tb = sys.exc_info()[2]
tbinfo = traceback.format_tb(tb)[0]
pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
print(pymsg)
print(pymsg)
pass
del addrCursor
del count
arcpy.AddField_management(input_table, field_name='MATCH_TYPE', field_type='TEXT', field_length='20')
print('Adding fields to original table')
exisitingFields = [f.name for f in arcpy.ListFields(input_table)]
fixDict = {}
for i, f in enumerate(input_fields):
# while building new fields, check to ensure fields don't already exist. If they do exisit, append '_1' to the new field name and update in input_fields
fieldDup = None
while f in exisitingFields:
f = f + '_1'
fieldDup = i
try:
arcpy.AddField_management(in_table=input_table, field_name=f, field_type='TEXT', field_length=lenList[i])
except arcpy.ExecuteError:
print('Row Failure')
msgs = arcpy.GetMessages(2)
print(msgs)
print(msgs)
except:
print(addr[0])
print('Row Failure')
tb = sys.exc_info()[2]
tbinfo = traceback.format_tb(tb)[0]
pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
print(pymsg)
print(pymsg)
if fieldDup is not None:
fixDict[i] = f
del lenList
# update field names in input_fields if duplicates are present
if len(fixDict) > 0:
for i, f in fixDict.items():
input_fields[i] = f
print('Add Geometry Fields')
arcpy.AddField_management(input_table, field_name='geocode_type', field_type='TEXT', field_length='20')
arcpy.AddField_management(input_table, field_name='GEOCODE_X', field_type='DOUBLE')
arcpy.AddField_management(input_table, field_name='GEOCODE_Y', field_type='DOUBLE')
count = 0
infoCursor = arcpy.da.UpdateCursor(input_table, [address, 'MATCH_TYPE'] + input_fields+['geocode_type', 'GEOCODE_X', 'GEOCODE_Y'])
for info in infoCursor:
try:
count += 1
if count in breaks:
print('Updating original table ' + str(int(round(count * 100.0 / countin))) + '% complete...')
if info[0] in infoDict:
aisData = infoDict.get(info[0])
dCount = 1
for data in aisData:
if isinstance(data, list):
info[dCount] = '|'.join(data)
else:
info[dCount] = data
dCount += 1
infoCursor.updateRow(info)
except arcpy.ExecuteError:
print('Script Failure')
msgs = arcpy.GetMessages(2)
print(msgs)
print(msgs)
pass
except:
print('Script Failure')
tb = sys.exc_info()[2]
tbinfo = traceback.format_tb(tb)[0]
pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
print(pymsg)
print(pymsg)
pass
del infoCursor
print('AIS Data Successfully Updated')
print(start_time - datetime.datetime.now())
Now that the X & Y coordinates have been added to the original feature using AIS, a new point feature can be created to display these points. The feature must first be exported to a table since the feature, which is currently a polygon feature, cannot be exported to a point feature.
Once the feature has been exported to a table, the point feature can be created uisng the X and Y coordinate fields. The coordinate system is still WGS 1984 since that's how the points needed to be geocoded, but we can reproject the new feature to be StatePlane to match the polygon feature.
# define function to combine steps
def geocoded_points(input_table, points):
# export polygon layer w/ coordinates as a table the geodatabse - required for XY step
arcpy.conversion.ExportTable(
in_table=input_table,
out_table=os.path.join(fgdb, "pointstable"))
# create point table from feature
arcpy.management.XYTableToPoint(
in_table="pointstable",
out_feature_class=os.path.join(fgdb, "points_xy"),
x_field="GEOCODE_X",
y_field="GEOCODE_Y",
# project layer in StatePlane South
coordinate_system='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119521E-09;0.001;0.001;IsHighPrecision')
# reproject the point feature to StatePlane Pennsylvania South
arcpy.management.Project(
in_dataset="points_xy",
out_dataset=os.path.join(fgdb, points),
out_coor_system='PROJCS["NAD_1927_StatePlane_Pennsylvania_South_FIPS_3702",GEOGCS["GCS_North_American_1927",DATUM["D_North_American_1927",SPHEROID["Clarke_1866",6378206.4,294.9786982]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",2000000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-77.75],PARAMETER["Standard_Parallel_1",39.93333333333333],PARAMETER["Standard_Parallel_2",40.96666666666667],PARAMETER["Latitude_Of_Origin",39.33333333333334],UNIT["Foot_US",0.3048006096012192]]',
transform_method="'WGS_1984_(ITRF00)_To_NAD_1983 + NAD_1927_To_NAD_1983_NADCON'",
in_coor_system='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]')
# delete table and unproject layers
arcpy.management.Delete([os.path.join(fgdb, "pointstable"), os.path.join(fgdb, "points_xy")])
# execute function
geocoded_points("asset_programs", "inj_hqs")
In the below code block, unnecessary fields will be removed.
# define function to combine steps
def hqfieldclean(hqs):
# get existing fields
fields = [f.name for f in arcpy.ListFields(hqs)]
# create a list of fields we want to delete - we don't need these anymore
del_fields = ['zip_ppd', 'self_drawn_bound', 'poly2', 'MATCH_TYPE', 'opa_address',
'street_address', 'geocode_type', 'GEOCODE_X', 'GEOCODE_Y']
# delete specified fields
arcpy.DeleteField_management(hqs, del_fields)
# execute function
hqfieldclean("inj_hqs")
Loaded in the Contents Pane are the zip code and PPD district features, which were pulled directly from AGO. A citywide boundary feature has also been created by dissolving the zip codes to keep the boundaries somewhat clean (PPD districts and zip codes already don’t align perfectly). Using these 3 features (zip codes, PPD districts, and citywide), the goal is to update the empty polygon geometries of each program/row in the original Survey123 feature with the corresponding geometries pulled from the 3 features, depending on which geographic unit the program has selected to show their service area boundary. The other provided fields are also being used – whether the program selected citywide ‘Y,’ whether the program prefers zip codes or PPD districts if citywide is ‘N,’ and whether the program drew their own boundary – to determine which of the 3 geometry types the code needs to apply.
The zip code layer was dissolved prior to running this to ensure that the citywide boundary matches the zip codes well. The resulting citywide layer then gets an added field indicating that it's citywide to match the survey output layer.
The following code ensures that programs with citywide service area boundaries now have geometries that cover the entire city.
# build a dictionary: citywide column & citywide geometry
def addcitywidegeom(citywidelyr, citywidefield, programlyr, programfield):
# add a citywide indicator field in the citywide layer to match programs layer
arcpy.AddField_management(citywidelyr, citywidefield, "TEXT")
# populate the new citywide indicator field to match programs layer
with arcpy.da.UpdateCursor(citywidelyr, [citywidefield]) as cursor:
for row in cursor:
row[0] = "y"
cursor.updateRow(row)
# create a dictionary storing the citywide geometry for citywide = 'y'
geometries = {key:value for (key,value) in arcpy.da.SearchCursor(citywidelyr, [citywidefield, 'SHAPE@'])}
# empty list to store ids in program layer not found in the citywide layer
notfound = []
# update program layer with geometries from citywide layer where citywide fields match
with arcpy.da.UpdateCursor(programlyr, [programfield, 'SHAPE@']) as cursor:
for row in cursor:
try:
row[1] = geometries[row[0]]
cursor.updateRow(row)
except:
notfound.append(row[0])
# any response in program layer that isn't citywide will just show up as "not found"
print('Found no id match and could not update geometry for IDs: ', notfound)
# execute function
addcitywidegeom("citywide", "citywide", "asset_programs", "citywide")
For survey entries that have zip codes listed as the service area boundary and prefer zip codes over PPD districts, the zip codes layer will be used to add corresponding zip code geometries to each entry.
The code has to list the survey entry output layer's separated zip code columns and match them to the zip code layer by zip code name; geometries for the survey entry output will then be replaced by the corresponding zip codes from the zip code layer. The zip codes will then be merged and dissolved into one boundary. All other attributes should remain the same for the survey entry layer.
# create function to add merged and dissolved zip code geometries to applicable rows
def addzipgeoms(ziplyr, zipfield, programlyr, programfield):
# set up a temporary workspace on disk
temp_workspace = arcpy.env.scratchGDB
# create a dictionary of zip codes and their geometries from the zip code feature
geometries = {key: value for (key, value) in arcpy.da.SearchCursor(ziplyr, [zipfield, 'SHAPE@'])}
# list to store dissolved geometries for each program row
dissolved_geometries = []
# SQL query to only update programs that aren't citywide or prefer to match PPD boundaries
sql_query = "citywide = 'n' AND self_drawn_bound = 'n' AND zip_ppd = 'zip'"
# loop over each program row
with arcpy.da.SearchCursor(programlyr, [programfield, 'SHAPE@'], sql_query) as cursor:
for row in cursor:
print(f"Processing row with zip codes: {row[0]}")
# split the string list of zip codes in the zip_service field
zip_codes = row[0].split(',')
# get the geometries for each zip code from the dictionary
temp_geoms = [geometries.get(zip_code.strip()) for zip_code in zip_codes if zip_code.strip() in geometries]
try:
# merge and dissolve geometries for each individual row, store as temps
if temp_geoms:
temp_merged = arcpy.Merge_management(temp_geoms, os.path.join(temp_workspace, "tempMerged"))
temp_dissolved = arcpy.Dissolve_management(temp_merged, os.path.join(temp_workspace, "tempDissolved"))
# create a temporary feature class for the dissolved geometry
temp_fc = os.path.join(temp_workspace, "tempDissolvedFC")
arcpy.CopyFeatures_management(temp_dissolved, temp_fc)
# get the first feature from the dissolved geometry
with arcpy.da.SearchCursor(temp_fc, ['SHAPE@']) as dissolve_cursor:
for dissolve_row in dissolve_cursor:
dissolved_geometry = dissolve_row[0]
break
# add dissolved geometry to the list
dissolved_geometries.append(dissolved_geometry)
# clean up the temporary feature classes
arcpy.management.Delete([temp_merged, temp_dissolved, temp_fc])
# if multipart geometry error is raised, skip
except arcpy.ExecuteError:
print(f"Skipping row due to multipart geometry issue.")
pass
# update all rows in the original feature class with the dissolved geometries
with arcpy.da.UpdateCursor(programlyr, ['SHAPE@'], sql_query) as update_cursor:
for dissolved_geometry in dissolved_geometries:
# update each row geometry with the corresponding dissolved temp geometry
update_row = next(update_cursor)
update_row[0] = dissolved_geometry
update_cursor.updateRow(update_row)
# clean up by deleting temporary objects
arcpy.Delete_management(temp_workspace)
# execute function
addzipgeoms("zips_poly", "CODE", "asset_programs", "zip_service")
For survey entries that have PPD districts listed as the service area boundary and prefer PPD districts over zip codes, the PPD districts layer will be used to add corresponding PPD district geometries to each entry.
Because PPD district topology is messier than zip codes (there are holes/gaps in the polygons that should be filled for accuracy and aesthetics), the Eliminate Polygon Part tool is incorporated in this cell block to fill those gaps using a 10% of total area parameter.
# create function to add merged and dissolved PPD district geometries to applicable rows
def addppdgeoms(ppdlyr, ppdfield, programlyr, programfield):
# set up a temporary workspace on disk
temp_workspace = arcpy.env.scratchGDB
# create a dictionary of PPD districts and their geometries from the PPD district feature
geometries = {key: value for (key, value) in arcpy.da.SearchCursor(ppdlyr, [ppdfield, 'SHAPE@'])}
# list to store dissolved geometries for each program row
dissolved_geometries = []
# SQL query to only update programs that aren't citywide or prefer to match zip code boundaries
sql_query = "citywide = 'n' AND self_drawn_bound = 'n' AND zip_ppd = 'ppd'"
# loop over each program row
with arcpy.da.SearchCursor(programlyr, [programfield, 'SHAPE@'], sql_query) as cursor:
for row in cursor:
print(f"Merging and dissolving row with PPD districts: {row[0]}")
# split the string list of zip codes in the zip_service field
ppd_districts = row[0].split(',')
# get the geometries for each zip code from the dictionary
temp_geoms = [geometries.get(ppd_district.strip()) for ppd_district in ppd_districts if ppd_district.strip() in geometries]
try:
# merge and dissolve geometries for each individual row, store as temps
if temp_geoms:
temp_merged = arcpy.Merge_management(temp_geoms, os.path.join(temp_workspace, "tempMerged"))
temp_dissolved = arcpy.Dissolve_management(temp_merged, os.path.join(temp_workspace, "tempDissolved"))
# create a temporary feature class for the dissolved geometry
temp_fc = os.path.join(temp_workspace, "tempDissolvedFC")
arcpy.CopyFeatures_management(temp_dissolved, temp_fc)
# get the first feature from the dissolved geometry
with arcpy.da.SearchCursor(temp_fc, ['SHAPE@']) as dissolve_cursor:
for dissolve_row in dissolve_cursor:
dissolved_geometry = dissolve_row[0]
break
# add dissolved geometry to the list
dissolved_geometries.append(dissolved_geometry)
# clean up the temporary feature classes
arcpy.Delete_management([temp_merged, temp_dissolved, temp_fc])
# if multipart geometry error is raised, skip
except arcpy.ExecuteError:
print(f"Skipping row due to multipart geometry issue.")
pass
# update all rows in the original feature class with the dissolved geometries
with arcpy.da.UpdateCursor(programlyr, ['SHAPE@'], sql_query) as update_cursor:
for dissolved_geometry in dissolved_geometries:
# update each row with the corresponding dissolved geometry
update_row = next(update_cursor)
update_row[0] = dissolved_geometry
update_cursor.updateRow(update_row)
# fill gaps in geometry due to poor quality PPD district topology
arcpy.management.EliminatePolygonPart(programlyr, os.path.join(fgdb, "geom_nogaps"),
condition="PERCENT", part_area_percent=10,
part_option="CONTAINED_ONLY")
# ELiminate Polygon Part doesn't allow overwriting features, so copy and merge back to programs lyr
arcpy.management.CopyFeatures("geom_nogaps", os.path.join(fgdb, programlyr))
# clean up by deleting temporary objects
arcpy.Delete_management([temp_workspace, os.path.join(fgdb, "geom_nogaps")])
# execute function
addppdgeoms("ppd_poly", "ppd_id_prog", "asset_programs", "ppd_service")
The following code block clips all program boundaries to the dissolved citywide boundary to ensure that the boundaries are fairly clean, as the PPD districts layer tends to extend beyond the zip codes layer. The code also sorts the programs in the resulting feature so that the citywide programs are at the bottom of the feature for visualization purposes (when using a web app for visualization, it is easiest for a user to be able to click and see the attributes of the programs with smaller boundaries than the programs with boundaries that extend across the entire city).
# define function
def clipsort(x):
# clip to citywide boundary for cleaning
arcpy.analysis.Clip(x, "citywide", "asset_programs_clean")
# sort the order so that citywide boundaries are shown below non-citywide
arcpy.management.Sort("asset_programs_clean",
os.path.join(fgdb, "asset_programs"),
sort_field="citywide ASCENDING", spatial_sort_method="UR")
# delete intermediate layer - arcpy doesn't allow overwriting
arcpy.management.Delete(os.path.join(fgdb, "asset_programs_clean"))
# execute function
clipsort("asset_programs")