#!/usr/bin/env python # coding: utf-8 # # Philadelphia Parks & Recreation (PPR) Crime Report ArcGIS Notebook # ## Project background # A separate analysis ArcGIS Notebook created a GIS point feature that joins the Philadelphia Parks & Recreation (PPR) properties spatial layer with the information for each crime that occurred this year within 5ft of each property. This feature includes all information regarding each crime’s season, the time it occurred, and crime type; this information comes with the publicly available Philadelphia Police Department (PPD) crime data, but this information is reshaped in our analysis to meet the standards of our crime report. This Crime Report Notebook takes the crime points layer - or the crimes that occurred in 2021 within 5ft of Philadelphia Parks & Recreation properties - and creates new dataframes, or tables, from this layer in order to summarize and display our data into the pie charts, overview tables, and individual site tables that our annual reports consist of. # The pie charts, overview tables, and individual site tables are exported and formatted programmatically within this report so that they can be added as separate elements into the annual report, which is expected to be developed using HTML and CSS in the future. In the report, each property that had a crime take place within its boundaries is given its own section in the report with tables and other graphics used to summarize/break down the crimes that occurred on the property by 1) the crime type, 2) the seasons in which the crimes occurred, and 3) the hours/time of day in which the crimes occurred. Therefore, this Notebook iterates over many PPR properties and their corresponding crimes to produce all of these elements (pie charts, various tables per property, etc.), which are saved separately to be added to the report. I simply replicated the existing design and formatting for the charts and tables in Python; I did not design any formats or our methods for how we determine what levels of "crime" are significant or insignificant at any given PPR property. # This Notebook consists of 4 main sections, each with many subsections. The structure breaks up repeating report elements (such as pivot tables, site pie charts, etc.) by district so that we can tailor the code to specific districts and more quickly locate potential errors. Since this is the first year of the code's development, the analyst running the code will occassionally have to change some elements of the code to tailor it to specific districts. Currently, the output includes 1) hundreds of tables saved as HTML files and styled with CSS stylesheets, 2) pie charts for each district and each site within each district saved as PNGs, and 3) an overview bar graph also saved as a PNG. Only the code for the overview visualizations and District 1 are included in this Notebook. # ## Setting Up # ### Import libraries and set environment # The cell below imports the necessary libraries - in this case, we need pandas, a common open-source library for table and summary manipulation, the Operating System library, numpy (for statistics), ArcGIS for Python API, and ArcPy. # In[1]: # the import statements load each module import pandas as pd import os import numpy as np from pathlib import Path import arcgis import arcpy # Best practice: Load arcpy last to maintain priority for namespace # The cells below set default the ArcGIS Pro project and the path that the figures and tables created below should be saved to. # In[2]: # set ArcGIS Pro project aprx = arcpy.mp.ArcGISProject("CURRENT") mp = aprx.listMaps('Map')[0] # first map in project - mine is titled "Map" # In[ ]: # set the path that the figures and tables should be saved to path = Path("") # paste path here print(path.absolute()) # ### Convert ``PPR_Crime_5ft`` to spatial dataframe for quicker manipulation # Our crime report filters our sites down further than the crime analysis did. We only want program sites, so we're going to set a SQL definition query to include only program sites in our report. This layer is already loaded in our Contents Pane within ArcGIS Pro. # In[5]: # set definition query for only program sites for lyr in mp.listLayers("PPR_Crime_5ft"): lyr.definitionQuery = "PROGRAM_SITES = 'Y'" # To manipulate our data using the pandas library, we need to convert our feature into a spatial dataframe. # In[6]: sdf = pd.DataFrame.spatial.from_featureclass("PPR_Crime_5ft") sdf.shape # to find number of rows and columns # Add an incident count column per PPR property (based on public name to avoid duplicates) - this is like summarizing our data in a separate table, but we're including it in our existing dataframe instead. We're counting the number of crime point incidents for each PPR property. # In[7]: # summarize incident count sdf['incident_count'] = sdf.groupby('PUBLIC_NAME')['PUBLIC_NAME'].transform('count') # If you preview the dataframe to make sure it aligns with the feature's attribute table using the head() function, you should see an accurate count of crimes per property/row in the last column of the dataframe. # ## Overview / Page 1 Tables and Charts # This section includes the code that creates our tables and charts for just the first page of the report. This page includes a pivot table to summarize crime counts by part (broader crime type) and district, an overall pie chart showing crime part by count, and an overall bar graph showing crime count by council district. These tables and charts represents all crimes that occurred on all PPR properties this year. # ### Center Pivot Table # We're using pandas to create a pivot table from our original ``sdf`` dataframe. This pivot table summarizes incident counts by crime type and district number. We format it using pandas and then save it to an HTML, which references a CSS stylesheet, to add colors, cell borders, font, and more. # In[9]: # get new df with only GIS Part by incident count - same as pg 1 upper-mid table # format pivot table as first step part_bycount = sdf.pivot_table(values='incident_count', index=['COUNCILDIS'], columns=['GIS_PART'], aggfunc='count') part_bycount.index.names = ['District #'] # rename index column part_bycount.columns.name = None # get rid of extra column name part_bycount['Total Incidents'] = part_bycount.sum(axis=1) # get total incident column part_bycount.loc["All Districts"] = part_bycount.sum() # get toal incident bottom row part_bycount.head(11) # see all 11 rows # Our pivot table isn't perfect (would be great to merge those first two header rows and highlight max values), but now we can export it as an HTML table and style it as much as we can using our CSS style sheet (path removed in code below). # In[10]: pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' with open(str(path) + "/OverviewPage/page1pivottable.html", 'w') as f: f.write(html_string.format(table=part_bycount.to_html(classes='mystyle'))) # If checking for the resulting table in the specified folder, you should find an HTML file that will look like this: # # ![output_pivot.PNG](attachment:output_pivot.PNG) # ### Pie Chart # To create the overall crime pie chart, which represents crime part (broader crime type) by crime count for all properties during the entire year, we first need a separate dataframe with only our counts per part. # In[11]: page1pie = sdf.groupby('GIS_PART')['incident_count'].agg('count').reset_index() page1pie.head(5) # With our new dataframe, we can now create our pie chart for the first page (all crimes). # In[12]: # Import libraries import numpy as np import matplotlib.pyplot as plt import matplotlib get_ipython().run_line_magic('matplotlib', 'inline') matplotlib.rcParams['font.sans-serif'] = "Open Sans" matplotlib.rcParams['font.family'] = "sans-serif" # Creating color parameters colors = ("#437EC6", "#B43C3C", "#91B151") explode = 0.04, 0.05, 0.04 # Wedge properties wp = { 'linewidth' : 1} # Creating autocpt arguments def autopct_format(values): def my_format(pct): total=sum(values) val=int(round(pct*total/100.0)) return '{:.0f}%\n({v:d} incidents)'.format(pct, v=val) return my_format # creates labels text - % and # of incidents # Creating plot fig, ax = plt.subplots(figsize =(10, 7)) # how big your overall figure (around pie chart) is wedges, texts, autotexts = ax.pie(page1pie['incident_count'], # value - crime count autopct = autopct_format(page1pie['incident_count']), explode = explode, colors = colors, # shadow = True, - turn on if you want drop shadow startangle = 90, wedgeprops = wp, textprops = dict(color ="black")) # text color # Adding legend ax.legend(wedges, page1pie['GIS_PART'], prop={'size':13}, # last parameter is legend font size loc ="upper center", # text in legend bbox_to_anchor =(0.8, 0, 0.4, 1)) # where the legend is placed plt.setp(autotexts, size = 13.5, weight ="bold") # labels ax.set_title("Total Incidents at PPR Sites by Part", weight='bold', fontname='Open Sans', fontsize='20') # title # show plot fig.savefig(str(path) + "/OverviewPage/page1pie.png", dpi=300) plt.show() # previews plot below # The resulting pie chart should be a PNG and will be located in the specified folder. Colors correspond to crime part and will remain consistent throughout the report. # ### Bar Graph # We're creating a new table with the same information as the pivot table for the bar graph. This table and graph show the number of incidents per part (broader crime type) by council district for the entire year. # In[13]: bargraph = sdf.groupby(['COUNCILDIS', 'GIS_PART'])['incident_count'].count().reset_index() bargraph.head(10) # Using the table we just created above, we're going to build and style a bar graph of crime count by council district. # In[14]: import matplotlib.pyplot as plt import numpy as np import matplotlib get_ipython().run_line_magic('matplotlib', 'inline') matplotlib.rcParams['font.sans-serif'] = "Open Sans" matplotlib.rcParams['font.family'] = "sans-serif" # Create a grouped bar chart, with job as the x-axis # and gender as the variable we're grouping on so there # are two bars per job. fig, ax = plt.subplots(figsize=(12, 8)) # Our x-axis. We basically just want a list # of numbers from zero with a value for each # of our jobs. x = np.arange(len(bargraph.COUNCILDIS.unique())) # Define bar width. We need this to offset the second bar. bar_width = 0.25 # Same thing, but offset the x. b1 = ax.bar(x, bargraph.loc[bargraph['GIS_PART'] == 'Part 1 Property Crime', 'incident_count'], color='#437EC6', width=bar_width, label='Part 1 Property Crime') b1 = ax.bar(x + bar_width, bargraph.loc[bargraph['GIS_PART'] == 'Part 1 Violent Crime', 'incident_count'], color='#B43C3C', width=bar_width, label='Part 1 Violent Crime') b3 = ax.bar(x + bar_width + bar_width, bargraph.loc[bargraph['GIS_PART'] == 'Part 2 Crime', 'incident_count'], color='#91B151', width=bar_width, label='Part 2 Crime') # Fix the x-axes. ax.set_xticks(x + bar_width / 2) ax.set_xticklabels(bargraph.COUNCILDIS.unique()) # Add legend. ax.legend(prop={'size':13}) # Axis styling. ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) ax.spines['left'].set_visible(False) ax.spines['bottom'].set_color('#DDDDDD') ax.tick_params(bottom=False, left=False) ax.set_axisbelow(True) ax.yaxis.grid(True, color='#EEEEEE') ax.xaxis.grid(False) # Add axis and chart labels. ax.set_xlabel('City Council District', fontname='Open Sans', fontsize='14', weight='bold', labelpad=15) ax.set_ylabel('Number of Incidents', fontname='Open Sans', fontsize='14', weight='bold', labelpad=15) ax.set_title('Total Incidents at PPR Sites by City Council District', pad=15, weight='bold', fontname='Open Sans', fontsize='20') # For each bar in the chart, add a text label. for bar in ax.patches: bar_value = bar.get_height() text = f'{bar_value:,}' # This will give the middle of each bar on the x-axis. text_x = bar.get_x() + bar.get_width() / 2 # get_y() is where the bar starts so we add the height to it. text_y = bar.get_y() + bar_value # If we want the text to be the same color as the bar, we can # get the color like so: bar_color = bar.get_facecolor() # If you want a consistent color, you can just set it as a constant, e.g. #222222 ax.text(text_x, text_y, text, ha='center', va='bottom', color=bar_color, size=12) fig.savefig(str(path) + "/OverviewPage/barchart", dpi=300) plt.show() # ## Summary Tables and Charts for Each District # We're now going to create our overview tables and pie charts as well as our individual site tables and pie charts. This section incorporates a subsection for each of the ten City Council Districts, with each district subsection including five subsections: 1) the district cover page summary analysis table, 2) the district cover page pivot table, 3) the district cover page pie chart, 4) the pie charts for each individual PPR site within the district, and 5) the 4 individual tables for each PPR site. # ### District 1 # #### Main Summary Analysis Table # The main summary analysis table refers to the table at the top left of the crime report's district overview page. This table includes all site names that have had crimes occur for the entire district. It also includes the number of incidents per site, the type of each crime (``GIS_PART``) and the crime category (``GIS_CRIME``). # In[15]: # filter original dataframe to see only District 1 sites dis1f = sdf[sdf['COUNCILDIS'] == 1] # summarize incident count dis1 = dis1f.groupby(['PUBLIC_NAME', 'GIS_PART', 'GIS_CRIME']).agg({'incident_count': 'count'}).reset_index() # shift incident count column to second first_column = dis1.pop('incident_count') dis1.insert(1, 'incident_count', first_column) # rename columns dis1.rename(columns = {'PUBLIC_NAME':'Site Name', 'GIS_PART':'Category', 'GIS_CRIME':'Crime Category', 'incident_count': '# of Incidents'}, inplace = True) # add a row at the bottom for total incidents dis1 = dis1.append(dis1.sum(numeric_only=True), ignore_index=True) dis1['Site Name'].fillna('Total Incidents in District', inplace=True) # fill in NAs and any decimals dis1 = dis1.replace(np.nan, '', regex=True) pd.set_option('precision', 0) # if having issues with decimals showing instead of whole numbers dis1.head(50) # We have our table content and layout from using pandas. Due to pandas' limited styling capabilities, we're going to again use the same CSS stylesheet to export this table as an HTML file so that it uses the fonts and cell colors that we're looking for in our report. # In[16]: pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' with open(str(path) + "/Districts/District1/Dis_OverviewPage/dis1summarytable.html", 'w') as f: f.write(html_string.format(table=dis1.to_html(classes='mystyle'))) # Our output is an HTML table that looks like this: # ![summarydis1.PNG](attachment:summarydis1.PNG) # #### Main Pivot Table # The district main pivot table is the table at the top right of the crime report's district overview page. This counts the number of crimes for each crime category (rows) and crime type (columns) for each district. Stylistically, PPR prefers a table that has blank cells instead of zeros, so that has been incorporated in the code below. # In[17]: # shape pivot table and add total column and row dis1_pivot = dis1f.pivot_table(values='incident_count', index=['GIS_CRIME'], columns=['GIS_PART'], aggfunc='count', margins = True, margins_name='Total', fill_value='').reset_index() # rename the crime category column - the GIS_PART column will be hidden by our CSS when exporting dis1_pivot.rename(columns = {'GIS_CRIME': 'Crime Category'}, inplace = True) # remove decimal points pd.set_option('precision', 0) dis1_pivot.head(10) # Again, we're exporting the table that we just created using pandas to an HTML so that our styling will be applied. # In[18]: pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' with open(str(path) + "/Districts/District1/Dis_OverviewPage/dis1pivot.html", 'w') as f: f.write(html_string.format(table=dis1_pivot.to_html(classes='mystyle'))) # The output should look like this: # ![pivotdis1.PNG](attachment:pivotdis1.PNG) # #### Pie Charts for Each Site # We can now set up our dataframe and figure for each site's pie chart. Each site with at least one crime will get its own pie chart to specify the number of crimes that occurred as well as the crime type. # We need to set our wedge color dictionary so the pie chart can assign the correct background colors for each part. # In[20]: cdict = {'Part 1 Property Crime': '#437EC6', 'Part 1 Violent Crime': '#B43C3C', 'Part 2 Crime': '#91B151'} # We're going to create a dataframe specifically for setting up the pie charts. Each dataframe row represents the site name, the number of incidents, and the crime type and category. # In[21]: dis1_sites = sdf[sdf['COUNCIL_DISTRICT'] == '1'] # filters to see only District 1 # summarize incident count per site dis1_sites = dis1_sites.groupby(['COUNCIL_DISTRICT', 'PUBLIC_NAME', 'GIS_PART']).agg({'incident_count': 'count'}).reset_index() # set up a pivot table to count sites by GIS type/part dis1_sites = dis1_sites.pivot_table(values='incident_count', index=['PUBLIC_NAME'], columns=['GIS_PART'], aggfunc=np.sum) dis1_sites.reset_index(inplace=True) print(dis1_sites.shape[0]) # checks how many rows there are - SEE BELOW FOR THE OUTPUT AFTER RUNNING. # ^ The little number that appears above represents the number of sites in this district; it will vary between districts and between years. This number will be used to determine how many subplots should be created in the pie chart cell block. A future improvement of the code below is to automatically set the number of subplots so that this code could be run for all districts at once. # In[253]: # import your libraries - all libraries used to generate plots need to be imported again in the cell block that generates the plot import matplotlib.pyplot as plt import numpy as np import matplotlib # setting the default font settings matplotlib.rcParams['font.sans-serif'] = "Open Sans" matplotlib.rcParams['font.family'] = "sans-serif" wp = { 'linewidth' : 1} # sets the number of pie charts and the size between the pie chart subplots fig, axes = plt.subplots(8, 2, figsize=(15, 40)) # CHANGE! numbers for subplots and figsize parameters # generate the pie charts - iterating over many rows in the dataframe for i, (idx, row) in enumerate(dis1_sites.set_index('PUBLIC_NAME').iterrows()): ax = axes[i // 2, i % 2] # CHANGE! depending on subplot and figsize parameters. Can only be as big as the second numnber in the subplots parameter! row = row[row.gt(row.sum() * .0)] wedges, texts, autotexts = ax.pie(row, labels=row.index, startangle=30, autopct=lambda x: '{:.0f} ({:.0f}%)'.format((x/100)*row.sum(),x), colors=[cdict[row] for row in row.index], wedgeprops = wp) # sets font size for labels plt.setp(autotexts, size = 11) # sets title and its font size ax.set_title("Total Incidents at PPR Sites by Part:\n" + str(dis1_sites['PUBLIC_NAME'][i]), weight='bold', fontname='Open Sans', fontsize='12', wrap=True) fig.subplots_adjust(wspace=.2) # saves all pie charts to one PNG - work on saving each pie chart as a separate PNG for future fig.savefig(str(path) + "/Districts/District1/Site_PieCharts/dis1_allsites.png", dpi=300) plt.show() plt.close() # #### Tables for Each Site # Each site in the crime report has 4 tables: 1) the first to show incident count by crime category (top left), 2) the second to show incident count by crime type (bottom left), 3) the third to show incident count by active/non-active hours for summer only (top right), and 4) the fourth to show incident count by active/non-active hours for off-season only (bottom right). # # The below cell blocks create and export all 4 tables per site in District 1. The ouput is 4 individual styled HTML tables per site. The files will be named after the number of the table and the site name. # We will first create another dataframe that has all the information we need for the tables in particular. # In[23]: # filter to only include crimes that occurred in District 1 dis1_tables = sdf[sdf['COUNCIL_DISTRICT'] == '1'] # summarize incident count dis1_tables['incident_count_new'] = sdf.groupby('GIS_CRIME')['GIS_CRIME'].transform('count') dis1_tables = dis1_tables.groupby(['COUNCIL_DISTRICT', 'PUBLIC_NAME', 'GIS_PART', 'GIS_CRIME', 'SEASON', 'ACTIVE_HOURS']).agg({'incident_count_new': 'count'}).reset_index() dis1_tables.head(10) # ##### Table 1 # Our first table is a pivot table that shows incident count by crime category (``GIS_CRIME``). This first cell block sets up the table, but we also want to add a total incidents row after each site for when we export individual tables for each site. # In[24]: # create the pivot table dis1table1 = dis1_tables.pivot_table(values='incident_count_new', index=['PUBLIC_NAME', 'GIS_CRIME'], aggfunc=np.sum) dis1table1.reset_index(inplace=True) dis1table1.head(10) # Now we need to add our totals row. We want the total of all incidents for each unique site name, so we're going to create a new column for each unique site name that is in the place of our existing site name (``PUBLIC_NAME``) column. We're doing this with a ``groupby()``, where we are essentially calculating the total for each site and then joining it back to the original table in the column of our choice. # # In the future, this code can be cleaned by incorporating the pandas pipe function. # In[25]: # create a new dataframe to calculate the total number of incidents per site dis1table1_sum= dis1table1.groupby(['PUBLIC_NAME'],as_index=False)['incident_count_new'].sum() # may need to remove one of the seasons if there are no crimes in that season # create a total incidents column to store the totals in dis1table1_sum['grand_total'] ='Total Incidents' # preparing to ensure that this total column corresponds to the correct site name in the site name columns dis1table1.sort_values(by=['PUBLIC_NAME'],ascending=[True],inplace=True) # create an object for our columns from the original table dis1table1col = dis1table1[['PUBLIC_NAME', 'GIS_CRIME', 'incident_count_new']] # concatenate our totals dataframe to match our original dataframe's columns dis1table1final = pd.concat([dis1table1col, dis1table1_sum]) # our totals columns dis1table1final = dis1table1final.sort_values(by=['PUBLIC_NAME','grand_total'],ascending=[True,True],na_position='first') # add our total rows to the crime category column, as that will be the first column we see in our exported tables dis1table1final['GIS_CRIME'] =np.where(dis1table1final['grand_total'].isnull(), dis1table1final['GIS_CRIME'], dis1table1final['grand_total']) # remove the total column - we no longer need it dis1table1final.drop(['grand_total'],axis=1,inplace=True) # fill any NA values in the table dis1table1final.fillna(value=0, inplace=True) # rename our columns for a more attractive table display dis1table1final.rename(columns = {'PUBLIC_NAME': 'Site Name', 'GIS_CRIME':'Crime Categories', 'incident_count_new': 'Incident Count'}, inplace = True) dis1table1final.head(10) # We now have our table with the number of incident counts per crime category, and each site now has a total incidents number tied to it as well. Now we just need to put these into a dictionary so that we can iterate through each site (the key value). # # We will then save each unique site's table to an HTML table. # In[26]: dict_site_1_1 = dict(iter(dis1table1final.groupby('Site Name'))) # print(dict_site_1_1) # if wanting to see pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' for i in dict_site_1_1: with open(str(path) + '/Districts/District1/Site_Tables/'+ str(i) + '_table1' + '.html', 'w') as f: f.write(html_string.format(table=dict_site_1_1[i].to_html(classes='mystyle'))) # There is now one table for each property saved in the site table folder. Each table should look something like this, with Frederick Cione Playground being the example in this screenshot: # ![table1snap.PNG](attachment:table1snap.PNG) # ##### Table 2 # Our second site pivot table is similar to our first, but it represents incident count by crime type (``GIS_PART``) instead. The layout and export steps are very similar to the first table. # In[27]: # set up the pivot table dis1table2 = dis1_tables.pivot_table(values='incident_count_new', index=['PUBLIC_NAME', 'GIS_PART'], aggfunc=np.sum) dis1table2.reset_index(inplace=True) dis1table2.head(10) # In[28]: # adding our sum dataframe for the totals rows and concatenating it to the original table 2 dataframe - same as before. dis1table2_sum= dis1table2.groupby(['PUBLIC_NAME'],as_index=False)['incident_count_new'].sum() # may need to remove one of the seasons if there are no crimes in that season dis1table2_sum['grand_total'] ='Total Incidents' dis1table2.sort_values(by=['PUBLIC_NAME'],ascending=[True],inplace=True) dis1table2col = dis1table2[['PUBLIC_NAME', 'GIS_PART', 'incident_count_new']] dis1table2final = pd.concat([dis1table2col, dis1table2_sum]) dis1table2final = dis1table2final.sort_values(by=['PUBLIC_NAME','grand_total'],ascending=[True,True],na_position='first') dis1table2final['GIS_PART'] =np.where(dis1table2final['grand_total'].isnull(), dis1table2final['GIS_PART'], dis1table2final['grand_total']) dis1table2final.drop(['grand_total'],axis=1,inplace=True) dis1table2final.fillna(value=0, inplace=True) dis1table2final.rename(columns = {'PUBLIC_NAME': 'Site Name', 'GIS_PART': 'Crime Type', 'incident_count_new': 'Incident Count'}, inplace = True) dis1table2final.head(10) # In[29]: # saving our table to a dictionary so each key (or site) will be saved separately) dict_site_1_2 = dict(iter(dis1table2final.groupby('Site Name'))) # saving to HTMLs using our CSS stylesheet pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' for i in dict_site_1_2: with open(str(path) + '/Districts/District1/Site_Tables/'+ str(i) + '_table2' + '.html', 'w') as f: f.write(html_string.format(table=dict_site_1_2[i].to_html(classes='mystyle'))) # Check the same site tables folder to make sure the tables have exported correctly. # Each table should look something like this, with Frederick Cione Playground being the example in this screenshot: # ![table2snap.PNG](attachment:table2snap.PNG) # ##### Table 3 # Our third table is a pivot table representing SUMMER incident counts by crime category (``GIS_CRIME``) broken up into active and non-active hours. # # Note that if no summer crimes occurred at a site, a table WILL NOT be generated. # In[30]: dis1table3 = dis1_tables[dis1_tables['SEASON'] == 'Summer'] dis1table3 = dis1table3.pivot_table(values='incident_count_new', index=['PUBLIC_NAME', 'GIS_CRIME'], columns=['ACTIVE_HOURS'], aggfunc=np.sum) dis1table3.reset_index(inplace=True) dis1table3.fillna(value=0, inplace=True) # comment out if you don't want the total row for either active/non-active column to display 0 incidents dis1table3.head(10) # The next steps are very similar to the first 2 site tables, but since most sites have very few summer crimes, the cell blocks for this third table may need to be modified or simply not run. # In[31]: # adding in our dataframe for the site totals dis1table3_sum= dis1table3.groupby(['PUBLIC_NAME'],as_index=False)['Active', 'Non-Active'].sum() # may need to remove one of the active hours types if there are no crimes in that season dis1table3_sum['grand_total'] ='Total Incidents' dis1table3.sort_values(by=['PUBLIC_NAME'],ascending=[True],inplace=True) # In[32]: # setting up our site totals, concatenating them to our original dataframe, and tweaking the table design dis1table3col = dis1table3[['PUBLIC_NAME', 'GIS_CRIME', 'Active', 'Non-Active']] # may need to remove one of the active hours types if there are no crimes in that season dis1table3final = pd.concat([dis1table3col, dis1table3_sum]) dis1table3final = dis1table3final.sort_values(by=['PUBLIC_NAME','grand_total'],ascending=[True,True],na_position='first') dis1table3final['GIS_CRIME'] =np.where(dis1table3final['grand_total'].isnull(), dis1table3final['GIS_CRIME'], dis1table3final['grand_total']) dis1table3final.drop(['grand_total'],axis=1,inplace=True) dis1table3final.fillna(value=0, inplace=True) dis1table3final.rename(columns = {'PUBLIC_NAME': 'Site Name', 'GIS_CRIME': 'Crime Category - Summer'}, inplace = True) dis1table3final.head(50) # In[33]: # saving our table to a dictionary so each key (or site) will be saved separately) dict_1_table3 = dict(iter(dis1table3final.groupby('Site Name'))) # saving to HTMLs using our CSS stylesheet pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' for i in dict_1_table3: with open(str(path) + '/Districts/District1/Site_Tables/'+ str(i) + '_table3' + '.html', 'w') as f: f.write(html_string.format(table=dict_1_table3[i].to_html(classes='mystyle'))) # Each table should look something like this, with James Otis Ford being the example in this screenshot: # ![table3snap_jamesotisford.PNG](attachment:table3snap_jamesotisford.PNG) # ##### Table 4 # Our fourth table is another pivot table representing the same information as table 3, but for off-season rather than summer. The process is very much the same, and the same potential warnings in the second cell block could come up. # In[34]: # set up our pivot table dis1table4 = dis1_tables[dis1_tables['SEASON'] == 'Off-Season'] dis1table4 = dis1table4.pivot_table(values='incident_count_new', index=['PUBLIC_NAME', 'GIS_CRIME'], columns=['ACTIVE_HOURS'], aggfunc=np.sum) dis1table4.reset_index(inplace=True) dis1table4.fillna(value=0, inplace=True) # comment out if you don't want the total row for either active/non-active column to display 0 incidents dis1table4.head(10) # In[35]: # add the totals for each site dis1table4_sum= dis1table4.groupby(['PUBLIC_NAME'],as_index=False)['Active', 'Non-Active'].sum() # might need to change dis1table4_sum['grand_total'] ='Total Incidents' dis1table4.sort_values(by=['PUBLIC_NAME'],ascending=[True],inplace=True) # In[36]: # setting up our site totals, concatenating them to our original dataframe, and tweaking the table design dis1table4col = dis1table4[['PUBLIC_NAME', 'GIS_CRIME', 'Active', 'Non-Active']] # might need to change dis1table4final = pd.concat([dis1table4col, dis1table4_sum]) dis1table4final = dis1table4final.sort_values(by=['PUBLIC_NAME','grand_total'],ascending=[True,True],na_position='first') dis1table4final['GIS_CRIME'] =np.where(dis1table4final['grand_total'].isnull(), dis1table4final['GIS_CRIME'], dis1table4final['grand_total']) dis1table4final.drop(['grand_total'],axis=1,inplace=True) dis1table4final.fillna(value=0, inplace=True) dis1table4final.rename(columns = {'PUBLIC_NAME': 'Site Name', 'GIS_CRIME': 'Crime Category - Off-Season'}, inplace = True) dis1table4final.head(10) # In[37]: # saving our table to a dictionary so each key (or site) will be saved separately) dict_1_table4 = dict(iter(dis1table4final.groupby('Site Name'))) # saving to HTMLs using our CSS stylesheet pd.set_option('colheader_justify', 'center') # FOR TABLE html_string = ''' {table} . ''' for i in dict_1_table4: with open(str(path) + '/Districts/District1/Site_Tables/'+ str(i) + '_table4' + '.html', 'w') as f: f.write(html_string.format(table=dict_1_table4[i].to_html(classes='mystyle'))) # Each table should look something like this, with Francis Cione Playground being the example in this screenshot: # ![table4snap.PNG](attachment:table4snap.PNG) # The rest of the code continues for each district. There are many improvements planned, such as 1) further automating the code to reduce changing the code (such as by writing custom functions to automatically proceed through errors) and 2) implementing new table designs to only have to generate 2 tables per site rather than 4. # # When this code was written, the outputs (PNGs and HTMLs) were still being manually added into a PowerPoint to format the report. Suggested improvements using this method were to use additional Python libraries to programatically convert HTMLs to PNGs so that they could be more easily inserted into the PowerPoint (at the time of writing the code, installing new libraries that were not already included in the default ArcGIS Pro Python environment was not possible). Beyond using a PowerPoint to format the report, I suggest that future analysts automate the production of the report entirely instead of just automating the creation of these tables and charts. There are also redesigns I would suggest for the analysis itself, such as taking the area of each property or the population of the area surrounding each property into account when determining the signficiance of crime instead of just representing crime by the raw number of incidents.