Notebook

Cleaning & Updating Survey123 Response Feature Geometries: Injury Prevention Finder App¶

Purpose of Notebook: to display our injury prevention partner programs on an asset map, each partner program will fill out our Survey123 form, which collects information on the program's services and service area boundaries. When a program's survey entry is submitted to our Survey123 form, it is collecting information for both a point (the headquarters) and a polygon (the service area) attached to it to be displayed in the final map. Currently, the only geometry that is drawn when a survey entry is submitted is the service area polygon if the participant draws their own service area boundary.

Many of the service area boundaries align with zip codes or the city's boundary, but the Survey123 form does not automatically create these geometries when the service area boundary question is filled out. Therefore, corresponding geometries are added after the survey entry is submitted.

The code will update the geometry for each row depending on the populated fields; e.g., if the service area boundary is citywide, that row/program will now automatically match the citywide boundary using dissolved zip codes. For programs that have multiple zip codes selected, the code will list these zip codes, match them to the zip codes layer based on zip code names, merge these zip codes, and dissolve them into one boundary around all these zip codes for the survey output layer. If a program chooses to display police districts, the same will be done but for police districts in those cases.

The code will also parse and geocode addresses using the Address Information System (AIS) and create a separate point layer with globalIDs for each feature that match their corresponding polygon feature. This code would be manually run by an analyst after each phase of data collection; once the team receives a sufficient number of survey responses on Survey123, this code should be run to incorporate all features in the Survey123 response layer.

In [ ]:

# import libraries
import pandas as pd
import numpy as np
import requests
import sys
import traceback
import usaddress
import datetime
from passyunk.parser import PassyunkParser
import re
import os
import arcgis
import arcpy

The output from the code block below is used to track who last edited the Notebook and when it was last updated.

In [ ]:

# run only when you edit the Notebook
from IPython.display import display, Markdown
from datetime import date
today = str(date.today()) # get date
currentuser = os.environ.get("USERNAME") # get username of person updating - must be signed in

display(Markdown("#### Author: Evelyn Gorey <br> Last run by " + currentuser + " on " + today))

Set environment¶

In [ ]:

# set Pro project
aprx = arcpy.mp.ArcGISProject("CURRENT")
mp = aprx.listMaps('Map')[0] # first map in project

# create a variable that represents the default file geodatabase
fgdb = r'' # set path
aprx.defaultGeodatabase = fgdb

arcpy.env.workspace = fgdb
arcpy.env.overwriteOutput=True

Prepare all layers needed for editing¶

Connect to ArcGIS Online and set your proxy.

In [ ]:

from arcgis.gis import GIS
proxy = {
    'http': 'proxy.phila.gov:8080/',
    'https': 'proxy.phila.gov:8080/',
}
gis = GIS(proxy=proxy)

Add the program layers¶

Add the Survey123 feature to the Contents pane.

In [ ]:

# define function to get the programs layers from AGO
def getprogramlayers():
    
    # get tracts layer from AGO (from layer group) to contents pane and keeps name of AGO feature
    mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/service_6ba2cfee114148b286a8305b50073904/FeatureServer/0')
    
    # get 2nd polygon (repeat) from AGO (from same layer group)
    mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/service_6ba2cfee114148b286a8305b50073904/FeatureServer/1')

    # create layer list of resulting features in contents pane (automatically named after how they are in AGO)
    agolyrs = ['Asset_Map_Survey_Test', 'rptPoly2']
    
    # iterate over the layers in the list to export layers to fgdb
    for lyr in agolyrs:
        # assign the base names for the output feature classes
        base_name = "asset_programs" if "Asset_Map_Survey_Test" in lyr else "asset_programs_poly2"

        # export
        arcpy.conversion.ExportFeatures(
            in_features=lyr,
            out_features=os.path.join(fgdb, base_name)
        )

    # delete intermediates from contents pane and geodatabase
    arcpy.management.Delete(["rptPoly2", "Asset_Map_Survey_Test"])
    
# execute function
getprogramlayers()

Add the boundary reference layers¶

Add the necessary City boundary layers (police/PPD districts and zip codes) to the Contents pane. Create a citywide boundary layer from the zip codes by dissolving them.

In [ ]:

# define function to get the boundary layers from AGO
def getboundarylayers():
    
    # add zip code AGO layer (from maps.phl) to contents pane and keeps name of AGO feature
    mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Zipcodes_Poly/FeatureServer/0') # 0 is first polygon, 1 is second

    # add PPD districts AGO layer (from maps.phl) to contents pane
    mp.addDataFromPath('https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Boundaries_District/FeatureServer/0')

    # create list of features to copy to geodatabase
    boundarylyrs = ["GIS_STREETS.Zipcodes_Poly", "Police Districts"]
    
    # iterate over the layers in the list to copy features to the file geodatabase with specified names
    for lyr in boundarylyrs:
        # assign the base names for the output feature classes
        base_name = "zips_poly" if "GIS_STREETS.Zipcodes_Poly" in lyr else "ppd_poly"

        # copy
        arcpy.management.CopyFeatures(
            in_features=lyr,
            out_feature_class=os.path.join(fgdb, base_name)
        )
    
    # create citywide layer (dissolved output of zipcodes)
    arcpy.management.Dissolve("zips_poly", os.path.join(fgdb, "citywide"))
    
    # remove unnecessary intermediate layers from contents pane
    arcpy.management.Delete(["GIS_STREETS.Zipcodes_Poly", "Police Districts"])

# execute function
getboundarylayers()

Clean survey AGO layer¶

I found that the easiest way to clean up the survey ouput ArcGIS Online layer is to convert it to a spatial dataframe first.

In [ ]:

# bring in layer as spatial dataframe for easier field manipulation
sdf = pd.DataFrame.spatial.from_featureclass("asset_programs")

Clean columns and content¶

Clean columns, only keep what is necessary, and fill in any missing cell values to make it easier to filter/apply geometry edits in the future.

In [ ]:

# pipe all data cleaning steps
sdf = (sdf
       
       # add columns - if column != y, change to 'n' (inconsistency due to changing survey since beginning)
      .assign(citywide=np.where(sdf['citywide'] == 'y', 'y', 'n'),
              poly2=np.where(sdf['poly2'] == 'y', 'y', 'n'),
              
              # if boundary geometry type preference isn't set, default to zip code
              zip_ppd=np.where(sdf['zip_ppd'] == 'ppd', 'ppd', 'zip'),
              
              # if citywide = y, populate with all possible zip codes
              zip_name=np.where((sdf['citywide'] == 'y'),
                                ['19102,19103,19104,19106,19107,19109,19111,19112,19114,19115,19116,19118,19119,19120,19121,19122,19123,19124,19125,19126,19127,19128,19129,19130,19131,19132,19133,19134,19135,19136,19137,19138,19139,19140,19141,19142,19143,19144,19145,19146,19147,19148,19149,19150,19151,19152,19153,19154'],
                                sdf['zip_name']),
              
              #  if citywide = y, populate with all possible PPD districts
              ppd_name=np.where((sdf['citywide'] == 'y'),
                                ['1,3,17,2,5,6,7,8,77,39,35,26,25,24,22,19,18,16,15,14,12,9'],
                                sdf['ppd_name']),
              
              # create a column to indicate when program's service area was drawn by the respondent -> that boundary should be what shows on the map
              self_drawn_bound=np.where(pd.isna(sdf['SHAPE']), 'n', 'y'))
       
       # rename columns to be more descriptive
      .rename(columns={'zip_name': 'zip_service', 'ppd_name': 'ppd_service'})
       
       # delete unnecessary columns
      .drop(columns=['list_contact', 'name', 'CreationDate', 'Creator',
                     'EditDate', 'Editor', 'district_boundary', 'list_capacity',
                     'services_length', 'services_length_year', 'services_length_month',
                     'description', 'ppd_yesno', 'prog_description', 'feedback',
                     'rptPoly2_count'])
      )

Parse address into 3 fields¶

The next code block cleans the address field and parses it into 3 fields - street address, city, and zip code. The City's passyunk parser, which was developed by the Office of Innovation & Technology (OIT), is used to create the street address column.

In [ ]:

# set parser
p = PassyunkParser()

## 1. Create the functions to extract parts of the address column - could probably condense these into one function

# street address - uses passyunk
def extract_street(address):
    parsed = p.parse(address)
    if 'components' in parsed:
        return re.sub(r'\w+', lambda m:m.group(0).capitalize(), # title case
                      parsed['components']['output_address']) 
    else:
        return address
# city name - uses usaddress
def extract_city(address):
    parsed = usaddress.tag(address)[0]
    if 'PlaceName' in parsed:
        return parsed['PlaceName']
    else:
        return address
# zip code - uses usaddress
def extract_zip(address):
    parsed = usaddress.tag(address)[0]
    if 'ZipCode' in parsed:
        return parsed['ZipCode']
    else:
        return address
    
## 2. Add and populate the new columns for each
sdf = (sdf
      .assign(address_street = sdf['address'].apply(extract_street),
             address_city = sdf['address'].apply(extract_city),
             address_zipcode = sdf['address'].apply(extract_zip))
      )

Reorder columns and export back to contents pane¶

Reorder the columns for better organization and preview all changes.

In [ ]:

# reorder the columns
sdf = sdf[['OBJECTID', #'globalid',
           'program_name', 'program_type', 'progtype_other',
          'services', 'services_other', 'referral_services', 'referral_services_other',
          'self_refer', 'capacity', 'service_length', 'desc_serviceprov', 'desc_spservices',
          'desc_eligibility', 'comments', 'email', 'phone', 'website', 'address', 'address_street',
          'address_city', 'address_zipcode','zip_service', 'ppd_service', 'citywide',
          'zip_ppd', 'self_drawn_bound', 'poly2', 'SHAPE']]

# preview all changes
sdf

The next code block exports the layer to a feature class so that the addresses can be geocoded and the empty geometries can be filled using arcpy.

In [ ]:

# export sdf to feature class - overwrite existing
sdf.spatial.to_featureclass(location=os.path.join(fgdb, "asset_programs"))

Fill in any missing PPD districts from string list field¶

When a program respondent takes the survey, the geometry section asks if the program operates citywide and if not, the respondent is asked to list the zip codes the program serves. Zip code is required for non-citywide programs, but there is another optional question to additionally provide the PPD districts the program serves.

Many of the programs have opted to list PPD districts as well as zip codes, but for the ones that did not, it would be cleaner and more standardized in the Finder's information card for each program to display PPD districts instead of just those who opted to provide that information.

To achieve this, we can use our zipcode and PPD district feature layers to see which PPD districts intersect the zip codes listed for each program. Since the geometries are a bit off from one another (e.g., their boundary lines in the middle of the Schuylkill River are slightly different, which could falsely return a PPD district that inaccurately intersects a tiny fraction of that zip code), for each PPD district that intersects a zip code, we can calculate the percentage of that PPD district that intersects with the specified zip code. If it makes up less than 5% of the zip code, it will not be returned.

In [ ]:

# define function
def matchppdstr(programlyr, ppddists, zipcodes, progzipfld, progppdfld, ppdfld, 
                zipfld):
    
    # add a new PPD district ID field in the PPD district layer to match programs layer
    arcpy.AddField_management(ppddists, ppdfld, "TEXT")
    
    # remove leading zeros in the new PPD district ID field to match programs layer
    with arcpy.da.UpdateCursor(ppddists, [ppdfld, "DIST_NUMC"]) as cursor:
        for row in cursor:
            row[0] = row[1].lstrip("0")
            cursor.updateRow(row)
    
    # create dictionaries for police districts and zip codes - IDs and geometries
    districts_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(ppddists, [ppdfld, "SHAPE@"])}
    zipcodes_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(zipcodes, [zipfld, "SHAPE@"])}

    # define function to get intersecting districts for a given set of zip codes
    def get_intersecting_districts(zip_codes_str):
        
        # initialize set to prevent ID from being added multiple times per row
        intersecting_districts = set()
        
        # split string field
        zip_codes_list = zip_codes_str.split(',')
        
        # define geometry part of dictionary and calculate intersection of zips and PPD dists
        for zip_code_id in zip_codes_list:
            zip_code_geom = zipcodes_dict[zip_code_id]
            for district_id, district_geom in districts_dict.items():
                
                # specifies arcpy geometry relationship rule: 4 = intersection
                intersection_geom = district_geom.intersect(zip_code_geom, 4) 
                
                # check if the area is not zero
                if not intersection_geom.area == 0:  
                    
                    # get the % in which the PPD district overlaps the zip code
                    ratio = intersection_geom.area / zip_code_geom.area
                    
                    # return only districts that make up >= 5% of total intersecting zip code
                    if ratio >= 0.05:
                        intersecting_districts.add(district_id)
                        
        return intersecting_districts
    
    # update police districts field ONLY if it isn't already filled out
    where_clause = f"{progzipfld} IS NOT NULL AND ({progppdfld} IS NULL OR {progppdfld} = '')"

    # start list at 0, each PPD district that matches where_clause increases it by 1
    selected_rows_count = 0
    
    # update cursor / rows
    with arcpy.da.UpdateCursor(programlyr, [progzipfld, progppdfld], where_clause) as cursor:
        for row in cursor:
            
            # each row that meets where_clause criteria increases counter by 1
            selected_rows_count += 1
            
            # add corresponding ID for each row that meets criteria 
            zip_codes_str = row[0]
            
            # applies our function to return intersecting PPD districts
            intersecting_districts = get_intersecting_districts(zip_codes_str)

            # update the ppd_service field with the comma-sep list of matching districts
            row[1] = ",".join(intersecting_districts)
            cursor.updateRow(row)

    # return number of rows processed - check if correct
    print(f"Selected Rows: {selected_rows_count}")

# execute function
matchppdstr("asset_programs", "ppd_poly", "zips_poly", "zip_service", "ppd_service",
           "ppd_id_prog", "CODE")

Enforce schema¶

We want consistent schema each time we run this notebook/export the layers to AGO. To achieve this, we can write some code to mandate that the field names and lengths are what we specify, as currently, the fields are all different lengths.

Because there is no known way using arcpy to edit the lengths of existing fields, we are going to use a workaround to export (copy) the feature with copied contents of each field and use field mapping to set the new field lengths on the copy.

Future work can include finding a way to edit existing field lengths or add new fields of the proper lengths within the original layer.

In [ ]:

# define function
def set_field_lengths(programs, field_lengths):
    
    # create a variable of the programs layer name
    shapefile_name = arcpy.Describe(programs).name

    # create a copy of the programs layer
    temp_feature_class = os.path.join(fgdb, shapefile_name + "_temp")

    # create a field mapping object
    field_mappings = arcpy.FieldMappings()

    # loop through each field and create a new field with the specified length in the field mapping
    for field_name, field_length in field_lengths.items():
        try:
            # Create a new field mapping
            field_map = arcpy.FieldMap()
            field_map.addInputField(programs, field_name)

            # Create a new field with the specified length in the field mapping
            new_field = arcpy.Field()
            new_field.name = field_name
            new_field.type = field_map.outputField.type
            new_field.length = field_length
            field_map.outputField = new_field

            # Add the field mapping to the field mappings object
            field_mappings.addFieldMap(field_map)

        except arcpy.ExecuteError as e:
            error_message = f"Error setting field length for '{field_name}': {e}"
            print(error_message)

    try:
        # export features using field mapping
        arcpy.conversion.ExportFeatures(programs, temp_feature_class, field_mapping=field_mappings)

        # delete the original feature class
        arcpy.management.Delete(os.path.join(fgdb, programs))

        # rename the copied feature class to the original name - FIX:
        arcpy.management.Rename(temp_feature_class, shapefile_name)
        
        print("Field length changes completed successfully.")
    except arcpy.ExecuteError as e:
        error_message = f"Error exporting features with field mapping: {e}"
        print(error_message)

# set path of programs feature
programs = "asset_programs"

# set field length dictionary
field_lengths = {'program_name': 75, 'program_type': 15, 'progtype_other': 50,
                'services': 150, 'services_other': 255, 'referral_services': 150, 'referral_services_other': 255, 
                'self_refer': 13, 'capacity': 10, 'service_length': 6, 'desc_serviceprov': 255, 'desc_spservices': 255,
                'desc_eligibility': 255, 'comments': 255, 'email': 100, 'phone': 12, 'website': 255, 'address': 100,
                'address_street': 100, 'address_city': 25, 'address_zipcode': 5, 'zip_service': 300, 'ppd_service': 100,
                'citywide': 1, 'zip_ppd': 10, 'self_drawn_bound': 1, 'poly2': 1}  

# execute function
set_field_lengths(programs, field_lengths)

We can print the fields of the new layer to make sure everything looks good. We're specifically interested in the string fields since they're the only ones we changed.

In [ ]:

# function to print existing fields
def print_fields(shapefile):

    # Use ListFields to get a list of field objects
    fields = arcpy.ListFields(shapefile)

    # Print field information
    for field in fields:
        print(f"Field Name: {field.name}")
        print(f"  Type: {field.type}")
        print(f"  Length: {field.length}")

# execute function
print_fields("asset_programs")

Geocode HQ points and create separate point layer¶

Geocoding the address for the headquarters using AIS will add and populate X and Y coordinate fields to the layer we were just working with, but only for Philadelphia addresses. (AIS was used instead of other geocoders due to its accuracy and its lack of need to consume ArcGIS credits.) A separate point feature will then need to be created from these coordinates and checked for accuracy.

Geocode using AIS¶

The code below was adapted from the Office of Licenses & Inspections (L&I) for this particular use. The code uses arcpy and the Philadelphia Address Information System (AIS), developed by OIT, to automatically parse each feature's address and add and populate coordinate fields representing the parcel address.

In [ ]:

# select programs feature for input
input_table = os.path.join(fgdb, 'asset_programs')

# select input address field from programs feature
address = 'address'

# paste desired output fields here as python list (just need address and coordinates)
input_fields = ['street_address']

# select coordinate system for output coordinates: '2272' (NAD83 / Pennsylvania South (ftUS) ) OR '4326' (WGS 84 - Standard lat/lon coordinates) 
coordinates = '4326'

# to get through proxy, you will need to enter your City username and password
user = 'evelyn.gorey'
password = ''

# you will need to specify your own unique AIS key
aiskey = ''

In [ ]:

# geocode using AIS - code from L&I that was adapted to this particular use
# more on AIS here: https://github.com/CityOfPhiladelphia/ais

basestring = str
start_time = datetime.datetime.now()
print('Beginning AIS Geocode Tool')
print(start_time)
print(coordinates)
print(address)
print(input_fields)

# ensures address field is included in input_fields if not already
if 'street_address' not in input_fields:
    input_fields.insert(0, 'street_address')
print(input_fields)

# returns number of rows/programs to geocode
countin = int(arcpy.GetCount_management(input_table).getOutput(0))
count = 0
print('Found {0} records in input table.'.format(countin))

# 
breaks = [int(float(countin) * float(b) / 100.0) for b in range(10, 100, 10)]
lenList = []
infoDict = {}
addrCursor = arcpy.da.SearchCursor(input_table, address)
#addrCursor = ['6635 MCCALLUM ST'] # line for testing a single address 

for addr in addrCursor:
    if addr is not None:
        try:
            count += 1
            if count in breaks:
                print('Part 1 of 3: Parsing Input Addresses ' + str(int(round(count * 100.0 / countin))) + '% complete...')
            # Part 1 -- US Address Parser
            parsed = usaddress.tag(addr[0], tag_mapping={
                'Recipient': 'recipient',
                'AddressNumber': 'address1',
                'AddressNumberPrefix': 'address1',
                'AddressNumberSuffix': 'address1',
                'StreetName': 'address1',
                'StreetNamePreDirectional': 'address1',
                'StreetNamePreModifier': 'address1',
                'StreetNamePreType': 'address1',
                'StreetNamePostDirectional': 'address1',
                'StreetNamePostModifier': 'address1',
                'StreetNamePostType': 'address1',
                'CornerOf': 'address1',
                'IntersectionSeparator': 'address1',
                'LandmarkName': 'address1',
                'USPSBoxGroupID': 'address1',
                'USPSBoxGroupType': 'address1',
                'USPSBoxID': 'address1',
                'USPSBoxType': 'address1',
                'BuildingName': 'address2',
                'OccupancyType': 'address2',
                'OccupancyIdentifier': 'address2',
                'SubaddressIdentifier': 'address2',
                'SubaddressType': 'address2',
                'PlaceName': 'city',
                'StateName': 'state',
                'ZipCode': 'zip_code',
            })
            try:
                address1 = parsed[0]['address1']
            except:
                address1 = None
            try:
                address2 = parsed[0]['address2']
            except:
                address2 = None
            try:
                zipcode = parsed[0]['zip_code']
            except:
                zipcode = None
            try:
                city = parsed[0]['city']
            except:
                city = None
            try:
                state = parsed[0]['state']
            except:
                state = None
            # del parsed_address
            paddress1 = {'address1': address1,
                   'address2': address2,
                   'zipcode': zipcode}

            # Part 2 -- Passyunk Parser
            p = PassyunkParser()
            preP = paddress1['address1'] + ' ' + paddress1['address2'] if paddress1['address2'] is not None else paddress1['address1']
            paddr = p.parse(preP)

            # Part 3 -- AIS API
            # if there is no unit, just find base address
            if paddr['components']['address_unit']['unit_num'] is None:
                proxyDict = {'http':'http://'+user+':' + password+'@proxy.phila.gov:8080', 'https': 'https://'+user+':' + password+'@proxy.phila.gov:8080'}
                r = requests.get(
                    'http://api.phila.gov/ais/v1/search/' + paddr['components']['output_address'] + '?gatekeeperKey='+aiskey+'&srid='+coordinates, proxies=proxyDict)
                aisString = r.json()
                if 'error' not in aisString:
                    features = aisString['features']
                    properties = features[0]['properties']
                    addrInfo = [features[0]['match_type']] + [properties[f.lower()] for f in input_fields[:-1]]
                    geometry = features[0]['geometry']
                    addrInfo.extend([geometry['geocode_type'], geometry['type']]+[v for v in geometry['coordinates']])
                    infoDict[addr[0]] = addrInfo
                    if not lenList:
                        lenList = [len(properties[f.lower()]) if properties[f.lower()] is not None and isinstance(properties[f.lower()], basestring) else len('|'.join(properties[f.lower()])) if isinstance(properties[f.lower()], list) else 1 for f in
                                   input_fields]
                    else:
                        tempLenList = [len(properties[f.lower()]) if properties[f.lower()] is not None and isinstance(properties[f.lower()], basestring) else len('|'.join(properties[f.lower()])) if isinstance(properties[f.lower()], list) else 1 for f in
                                   input_fields]
                        for i, l in enumerate(lenList):
                            if tempLenList[i] > l:
                                lenList[i] = tempLenList[i]
                        del tempLenList
            # if there is a unit, find unit information
            else:
                proxyDict = {'http':'http://'+user+':' + password+'@proxy.phila.gov:8080', 'https': 'https://'+user+':' + password+'@proxy.phila.gov:8080'}
                r = requests.get(
                    'http://api.phila.gov/ais/v1/search/' + paddr['components']['output_address'] + '?include_units&gatekeeperKey=8bb0f304f401dc79fe0bded1eda25fc1&srid='+coordinates, proxies=proxyDict)
                aisString = r.json()
                if 'error' not in aisString:
                    features = aisString['features']
                    properties = features[0]['properties']
                    addrInfo = [features[0]['match_type']] + [properties[f.lower()] for f in input_fields[:-1]]
                    geometry = features[0]['geometry']
                    addrInfo.extend([geometry['geocode_type'], geometry['type']]+[v for v in geometry['coordinates']])
                    infoDict[addr[0]] = addrInfo
                    if not lenList:
                        lenList = [len(properties[f.lower()]) if properties[f.lower()] is not None else 1 for f in
                                   input_fields]
                    else:
                        tempLenList = [len(properties[f.lower()]) if properties[f.lower()] is not None else 1 for f in
                                       input_fields]
                        for i, l in enumerate(lenList):
                            if tempLenList[i] > l:
                                lenList[i] = tempLenList[i]
                        del tempLenList

        except arcpy.ExecuteError:
            print('Row Failure')
            msgs = arcpy.GetMessages(2)
            print(msgs)
            print(msgs)
            pass
        except:
            print(addr[0])
            print('Row Failure')
            tb = sys.exc_info()[2]
            tbinfo = traceback.format_tb(tb)[0]
            pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
            print(pymsg)
            print(pymsg)
            pass

del addrCursor
del count

arcpy.AddField_management(input_table, field_name='MATCH_TYPE', field_type='TEXT', field_length='20')
print('Adding fields to original table')
exisitingFields = [f.name for f in arcpy.ListFields(input_table)]
fixDict = {}
for i, f in enumerate(input_fields):
    # while building new fields, check to ensure fields don't already exist. If they do exisit, append '_1' to the new field name and update in input_fields
    fieldDup = None
    while f in exisitingFields:
        f = f + '_1'
        fieldDup = i
    try:
        arcpy.AddField_management(in_table=input_table, field_name=f, field_type='TEXT', field_length=lenList[i])
    except arcpy.ExecuteError:
        print('Row Failure')
        msgs = arcpy.GetMessages(2)
        print(msgs)
        print(msgs)
        
    except:
        print(addr[0])
        print('Row Failure')
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
        print(pymsg)
        print(pymsg)
    if fieldDup is not None:
        fixDict[i] = f
del lenList

# update field names in input_fields if duplicates are present
if len(fixDict) > 0:
    for i, f in fixDict.items():
        input_fields[i] = f
print('Add Geometry Fields')
arcpy.AddField_management(input_table, field_name='geocode_type', field_type='TEXT', field_length='20')
arcpy.AddField_management(input_table, field_name='GEOCODE_X', field_type='DOUBLE')
arcpy.AddField_management(input_table, field_name='GEOCODE_Y', field_type='DOUBLE')
count = 0

infoCursor = arcpy.da.UpdateCursor(input_table, [address, 'MATCH_TYPE'] + input_fields+['geocode_type', 'GEOCODE_X', 'GEOCODE_Y'])
for info in infoCursor:
    try:
        count += 1
        if count in breaks:
            print('Updating original table ' + str(int(round(count * 100.0 / countin))) + '% complete...')
        if info[0] in infoDict:
            aisData = infoDict.get(info[0])
            dCount = 1
            for data in aisData:
                if isinstance(data, list):
                    info[dCount] = '|'.join(data)
                else:
                    info[dCount] = data
                dCount += 1
            infoCursor.updateRow(info)
    except arcpy.ExecuteError:
        print('Script Failure')
        msgs = arcpy.GetMessages(2)
        print(msgs)
        print(msgs)
        pass
    except:
        print('Script Failure')
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1])
        print(pymsg)
        print(pymsg)
        pass
del infoCursor
print('AIS Data Successfully Updated')
print(start_time - datetime.datetime.now())

Create separate point layer using X & Y coordinates¶

Now that the X & Y coordinates have been added to the original feature using AIS, a new point feature can be created to display these points. The feature must first be exported to a table since the feature, which is currently a polygon feature, cannot be exported to a point feature.

Once the feature has been exported to a table, the point feature can be created uisng the X and Y coordinate fields. The coordinate system is still WGS 1984 since that's how the points needed to be geocoded, but we can reproject the new feature to be StatePlane to match the polygon feature.

In [ ]:

# define function to combine steps
def geocoded_points(input_table, points):
    
    # export polygon layer w/ coordinates as a table the geodatabse - required for XY step
    arcpy.conversion.ExportTable(
        in_table=input_table,
        out_table=os.path.join(fgdb, "pointstable"))
    
    # create point table from feature
    arcpy.management.XYTableToPoint(
        in_table="pointstable",
        out_feature_class=os.path.join(fgdb, "points_xy"),
        x_field="GEOCODE_X",
        y_field="GEOCODE_Y",
        # project layer in StatePlane South
        coordinate_system='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119521E-09;0.001;0.001;IsHighPrecision')

    # reproject the point feature to StatePlane Pennsylvania South
    arcpy.management.Project(
        in_dataset="points_xy",
        out_dataset=os.path.join(fgdb, points),
        out_coor_system='PROJCS["NAD_1927_StatePlane_Pennsylvania_South_FIPS_3702",GEOGCS["GCS_North_American_1927",DATUM["D_North_American_1927",SPHEROID["Clarke_1866",6378206.4,294.9786982]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",2000000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-77.75],PARAMETER["Standard_Parallel_1",39.93333333333333],PARAMETER["Standard_Parallel_2",40.96666666666667],PARAMETER["Latitude_Of_Origin",39.33333333333334],UNIT["Foot_US",0.3048006096012192]]',
        transform_method="'WGS_1984_(ITRF00)_To_NAD_1983 + NAD_1927_To_NAD_1983_NADCON'",
        in_coor_system='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]')
    
    # delete table and unproject layers
    arcpy.management.Delete([os.path.join(fgdb, "pointstable"), os.path.join(fgdb, "points_xy")])
    
# execute function
geocoded_points("asset_programs", "inj_hqs")

Clean headquarters/points layer¶

In the below code block, unnecessary fields will be removed.

In [ ]:

# define function to combine steps
def hqfieldclean(hqs):
        
    # get existing fields
    fields = [f.name for f in arcpy.ListFields(hqs)]
    
    # create a list of fields we want to delete - we don't need these anymore
    del_fields = ['zip_ppd', 'self_drawn_bound', 'poly2', 'MATCH_TYPE', 'opa_address',
             'street_address', 'geocode_type', 'GEOCODE_X', 'GEOCODE_Y']
    
    # delete specified fields 
    arcpy.DeleteField_management(hqs, del_fields)

# execute function
hqfieldclean("inj_hqs")

Assign geometry to service area polygons and create separate polygon layer¶

Loaded in the Contents Pane are the zip code and PPD district features, which were pulled directly from AGO. A citywide boundary feature has also been created by dissolving the zip codes to keep the boundaries somewhat clean (PPD districts and zip codes already don’t align perfectly). Using these 3 features (zip codes, PPD districts, and citywide), the goal is to update the empty polygon geometries of each program/row in the original Survey123 feature with the corresponding geometries pulled from the 3 features, depending on which geographic unit the program has selected to show their service area boundary. The other provided fields are also being used – whether the program selected citywide ‘Y,’ whether the program prefers zip codes or PPD districts if citywide is ‘N,’ and whether the program drew their own boundary – to determine which of the 3 geometry types the code needs to apply.

Match citywide boundary¶

The zip code layer was dissolved prior to running this to ensure that the citywide boundary matches the zip codes well. The resulting citywide layer then gets an added field indicating that it's citywide to match the survey output layer.

The following code ensures that programs with citywide service area boundaries now have geometries that cover the entire city.

In [ ]:

# build a dictionary: citywide column & citywide geometry
def addcitywidegeom(citywidelyr, citywidefield, programlyr, programfield):
    
    # add a citywide indicator field in the citywide layer to match programs layer
    arcpy.AddField_management(citywidelyr, citywidefield, "TEXT")
    
    # populate the new citywide indicator field to match programs layer
    with arcpy.da.UpdateCursor(citywidelyr, [citywidefield]) as cursor:
        for row in cursor:
            row[0] = "y"
            cursor.updateRow(row)
    
    # create a dictionary storing the citywide geometry for citywide = 'y'
    geometries = {key:value for (key,value) in arcpy.da.SearchCursor(citywidelyr, [citywidefield, 'SHAPE@'])}
 
    # empty list to store ids in program layer not found in the citywide layer
    notfound = []
    
    # update program layer with geometries from citywide layer where citywide fields match
    with arcpy.da.UpdateCursor(programlyr, [programfield, 'SHAPE@']) as cursor:
        for row in cursor:
            try:
                row[1] = geometries[row[0]]
                cursor.updateRow(row)
            except:
                notfound.append(row[0])
    # any response in program layer that isn't citywide will just show up as "not found"
    print('Found no id match and could not update geometry for IDs: ', notfound)
    
# execute function
addcitywidegeom("citywide", "citywide", "asset_programs", "citywide")

Match zip code boundaries¶

For survey entries that have zip codes listed as the service area boundary and prefer zip codes over PPD districts, the zip codes layer will be used to add corresponding zip code geometries to each entry.

The code has to list the survey entry output layer's separated zip code columns and match them to the zip code layer by zip code name; geometries for the survey entry output will then be replaced by the corresponding zip codes from the zip code layer. The zip codes will then be merged and dissolved into one boundary. All other attributes should remain the same for the survey entry layer.

In [ ]:

# create function to add merged and dissolved zip code geometries to applicable rows
def addzipgeoms(ziplyr, zipfield, programlyr, programfield):
    
    # set up a temporary workspace on disk
    temp_workspace = arcpy.env.scratchGDB

    # create a dictionary of zip codes and their geometries from the zip code feature
    geometries = {key: value for (key, value) in arcpy.da.SearchCursor(ziplyr, [zipfield, 'SHAPE@'])}

    # list to store dissolved geometries for each program row
    dissolved_geometries = []

    # SQL query to only update programs that aren't citywide or prefer to match PPD boundaries
    sql_query = "citywide = 'n' AND self_drawn_bound = 'n' AND zip_ppd = 'zip'"

    # loop over each program row
    with arcpy.da.SearchCursor(programlyr, [programfield, 'SHAPE@'], sql_query) as cursor:
        for row in cursor:
            print(f"Processing row with zip codes: {row[0]}")

            # split the string list of zip codes in the zip_service field
            zip_codes = row[0].split(',')

            # get the geometries for each zip code from the dictionary
            temp_geoms = [geometries.get(zip_code.strip()) for zip_code in zip_codes if zip_code.strip() in geometries]

            try:
                # merge and dissolve geometries for each individual row, store as temps
                if temp_geoms:
                    temp_merged = arcpy.Merge_management(temp_geoms, os.path.join(temp_workspace, "tempMerged"))
                    temp_dissolved = arcpy.Dissolve_management(temp_merged, os.path.join(temp_workspace, "tempDissolved"))

                    # create a temporary feature class for the dissolved geometry
                    temp_fc = os.path.join(temp_workspace, "tempDissolvedFC")
                    arcpy.CopyFeatures_management(temp_dissolved, temp_fc)

                    # get the first feature from the dissolved geometry
                    with arcpy.da.SearchCursor(temp_fc, ['SHAPE@']) as dissolve_cursor:
                        for dissolve_row in dissolve_cursor:
                            dissolved_geometry = dissolve_row[0]
                            break

                    # add dissolved geometry to the list
                    dissolved_geometries.append(dissolved_geometry)

                    # clean up the temporary feature classes
                    arcpy.management.Delete([temp_merged, temp_dissolved, temp_fc])
            
            # if multipart geometry error is raised, skip
            except arcpy.ExecuteError:
                print(f"Skipping row due to multipart geometry issue.")
                pass

    # update all rows in the original feature class with the dissolved geometries
    with arcpy.da.UpdateCursor(programlyr, ['SHAPE@'], sql_query) as update_cursor:
        for dissolved_geometry in dissolved_geometries:
            # update each row geometry with the corresponding dissolved temp geometry
            update_row = next(update_cursor)
            update_row[0] = dissolved_geometry
            update_cursor.updateRow(update_row)

    # clean up by deleting temporary objects
    arcpy.Delete_management(temp_workspace)
    
# execute function
addzipgeoms("zips_poly", "CODE", "asset_programs", "zip_service")

Match PPD (Philadelphia Police District) boundaries¶

For survey entries that have PPD districts listed as the service area boundary and prefer PPD districts over zip codes, the PPD districts layer will be used to add corresponding PPD district geometries to each entry.

Because PPD district topology is messier than zip codes (there are holes/gaps in the polygons that should be filled for accuracy and aesthetics), the Eliminate Polygon Part tool is incorporated in this cell block to fill those gaps using a 10% of total area parameter.

In [ ]:

# create function to add merged and dissolved PPD district geometries to applicable rows
def addppdgeoms(ppdlyr, ppdfield, programlyr, programfield):
    
    # set up a temporary workspace on disk
    temp_workspace = arcpy.env.scratchGDB
    
    # create a dictionary of PPD districts and their geometries from the PPD district feature
    geometries = {key: value for (key, value) in arcpy.da.SearchCursor(ppdlyr, [ppdfield, 'SHAPE@'])}

    # list to store dissolved geometries for each program row
    dissolved_geometries = []

    # SQL query to only update programs that aren't citywide or prefer to match zip code boundaries
    sql_query = "citywide = 'n' AND self_drawn_bound = 'n' AND zip_ppd = 'ppd'"

    # loop over each program row
    with arcpy.da.SearchCursor(programlyr, [programfield, 'SHAPE@'], sql_query) as cursor:
        for row in cursor:
            print(f"Merging and dissolving row with PPD districts: {row[0]}")

            # split the string list of zip codes in the zip_service field
            ppd_districts = row[0].split(',')

            # get the geometries for each zip code from the dictionary
            temp_geoms = [geometries.get(ppd_district.strip()) for ppd_district in ppd_districts if ppd_district.strip() in geometries]

            try:
                # merge and dissolve geometries for each individual row, store as temps
                if temp_geoms:
                    temp_merged = arcpy.Merge_management(temp_geoms, os.path.join(temp_workspace, "tempMerged"))
                    temp_dissolved = arcpy.Dissolve_management(temp_merged, os.path.join(temp_workspace, "tempDissolved"))

                    # create a temporary feature class for the dissolved geometry
                    temp_fc = os.path.join(temp_workspace, "tempDissolvedFC")
                    arcpy.CopyFeatures_management(temp_dissolved, temp_fc)

                    # get the first feature from the dissolved geometry
                    with arcpy.da.SearchCursor(temp_fc, ['SHAPE@']) as dissolve_cursor:
                        for dissolve_row in dissolve_cursor:
                            dissolved_geometry = dissolve_row[0]
                            break

                    # add dissolved geometry to the list
                    dissolved_geometries.append(dissolved_geometry)

                    # clean up the temporary feature classes
                    arcpy.Delete_management([temp_merged, temp_dissolved, temp_fc])
            
            # if multipart geometry error is raised, skip
            except arcpy.ExecuteError:
                print(f"Skipping row due to multipart geometry issue.")
                pass

    # update all rows in the original feature class with the dissolved geometries
    with arcpy.da.UpdateCursor(programlyr, ['SHAPE@'], sql_query) as update_cursor:
        for dissolved_geometry in dissolved_geometries:
            # update each row with the corresponding dissolved geometry
            update_row = next(update_cursor)
            update_row[0] = dissolved_geometry
            update_cursor.updateRow(update_row)
        
    # fill gaps in geometry due to poor quality PPD district topology
    arcpy.management.EliminatePolygonPart(programlyr, os.path.join(fgdb, "geom_nogaps"), 
                                          condition="PERCENT", part_area_percent=10, 
                                          part_option="CONTAINED_ONLY")

    # ELiminate Polygon Part doesn't allow overwriting features, so copy and merge back to programs lyr
    arcpy.management.CopyFeatures("geom_nogaps", os.path.join(fgdb, programlyr))
    
    # clean up by deleting temporary objects
    arcpy.Delete_management([temp_workspace, os.path.join(fgdb, "geom_nogaps")])
    
# execute function
addppdgeoms("ppd_poly", "ppd_id_prog", "asset_programs", "ppd_service")

Clean program service area boundaries (polygons)¶

Clip everything to citywide boundary to keep clean¶

The following code block clips all program boundaries to the dissolved citywide boundary to ensure that the boundaries are fairly clean, as the PPD districts layer tends to extend beyond the zip codes layer. The code also sorts the programs in the resulting feature so that the citywide programs are at the bottom of the feature for visualization purposes (when using a web app for visualization, it is easiest for a user to be able to click and see the attributes of the programs with smaller boundaries than the programs with boundaries that extend across the entire city).

In [ ]:

# define function
def clipsort(x):
    
    # clip to citywide boundary for cleaning
    arcpy.analysis.Clip(x, "citywide", "asset_programs_clean")
    
    # sort the order so that citywide boundaries are shown below non-citywide 
    arcpy.management.Sort("asset_programs_clean", 
                          os.path.join(fgdb, "asset_programs"),
                                       sort_field="citywide ASCENDING", spatial_sort_method="UR")
    
    # delete intermediate layer - arcpy doesn't allow overwriting
    arcpy.management.Delete(os.path.join(fgdb, "asset_programs_clean"))
    
# execute function    
clipsort("asset_programs")