Analyzing NYC High School Data¶

One of the most controversial issues in the U.S. educational system is the effectiveness of standardized tests, and whether they're unfair to certain groups.

One of such exam is SAT. The SAT, or Scholastic Aptitude Test, is an exam that U.S. high school students take before applying to college. Colleges take the test scores into account when deciding who to admit, so it's fairly important to perform well on it.

In this project, we will collect data from various sources and determine whether SAT scores are unfair to certain groups.

New York City has a significant immigrant population and is very diverse, so comparing demographic factors such as race, income, and gender with SAT scores is a good way to determine whether the SAT is a fair test. For example, if certain racial groups consistently perform better on the SAT, we would have some evidence that the SAT is unfair.

For the purposes of this project, we'll be using data about New York City public schools, which can be found here.
Below are the datasets that we will be using:

SAT scores by school - SAT scores for each high school in New York City
School attendance - Attendance information for each school in New York City
Class size - Information on class size for each school
AP test results - Advanced Placement (AP) exam results for each high school (passing an optional AP exam in a particular subject can earn a student college credit in that subject)
Graduation outcomes - The percentage of students who graduated, and other outcome information
Demographics - Demographic information for each school
School survey - Surveys of parents, teachers, and students at each school

All of these data sets are interrelated. We'll combine all of them into a single data set combined which will be used in our analysis, to find correlations.

Read in the data¶

In [1]:

import pandas as pd
import numpy
import re

data_files = [
    "ap_2010.csv",
    "class_size.csv",
    "demographics.csv",
    "graduation.csv",
    "hs_directory.csv",
    "sat_results.csv"
]

data = {}

for f in data_files:
    d = pd.read_csv("schools/{0}".format(f))
    data[f.replace(".csv", "")] = d

Print Data Keys¶

In [2]:

for item in data:
    print(item)

ap_2010
class_size
demographics
graduation
hs_directory
sat_results

Print Data Values (Datasets)¶

In [3]:

data['ap_2010'].head()

Out[3]:

	DBN	SchoolName	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5
0	01M448	UNIVERSITY NEIGHBORHOOD H.S.	39	49	10
1	01M450	EAST SIDE COMMUNITY HS	19	21	s
2	01M515	LOWER EASTSIDE PREP	24	26	24
3	01M539	NEW EXPLORATIONS SCI,TECH,MATH	255	377	191
4	02M296	High School of Hospitality Management	s	s	s

In [4]:

data['class_size'].head()

Out[4]:

	CSD	BOROUGH	SCHOOL CODE	SCHOOL NAME	GRADE	PROGRAM TYPE	CORE SUBJECT (MS CORE and 9-12 ONLY)	CORE COURSE (MS CORE and 9-12 ONLY)	SERVICE CATEGORY(K-9* ONLY)	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	DATA SOURCE	SCHOOLWIDE PUPIL-TEACHER RATIO
0	1	M	M015	P.S. 015 Roberto Clemente	0K	GEN ED	-	-	-	19.0	1.0	19.0	19.0	19.0	ATS	NaN
1	1	M	M015	P.S. 015 Roberto Clemente	0K	CTT	-	-	-	21.0	1.0	21.0	21.0	21.0	ATS	NaN
2	1	M	M015	P.S. 015 Roberto Clemente	01	GEN ED	-	-	-	17.0	1.0	17.0	17.0	17.0	ATS	NaN
3	1	M	M015	P.S. 015 Roberto Clemente	01	CTT	-	-	-	17.0	1.0	17.0	17.0	17.0	ATS	NaN
4	1	M	M015	P.S. 015 Roberto Clemente	02	GEN ED	-	-	-	15.0	1.0	15.0	15.0	15.0	ATS	NaN

In [5]:

data['demographics'].head()

Out[5]:

	DBN	Name	schoolyear	fl_percent	frl_percent	total_enrollment	prek	k	grade1	grade2	...	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per
0	01M015	P.S. 015 ROBERTO CLEMENTE	20052006	89.4	NaN	281	15	36	40	33	...	74	26.3	189	67.3	5	1.8	158.0	56.2	123.0	43.8
1	01M015	P.S. 015 ROBERTO CLEMENTE	20062007	89.4	NaN	243	15	29	39	38	...	68	28.0	153	63.0	4	1.6	140.0	57.6	103.0	42.4
2	01M015	P.S. 015 ROBERTO CLEMENTE	20072008	89.4	NaN	261	18	43	39	36	...	77	29.5	157	60.2	7	2.7	143.0	54.8	118.0	45.2
3	01M015	P.S. 015 ROBERTO CLEMENTE	20082009	89.4	NaN	252	17	37	44	32	...	75	29.8	149	59.1	7	2.8	149.0	59.1	103.0	40.9
4	01M015	P.S. 015 ROBERTO CLEMENTE	20092010		96.5	208	16	40	28	32	...	67	32.2	118	56.7	6	2.9	124.0	59.6	84.0	40.4

5 rows × 38 columns

In [6]:

data['graduation'].head()

Out[6]:

	Demographic	DBN	School Name	Cohort	Total Cohort	Total Grads - n	Total Grads - % of cohort	Total Regents - n	Total Regents - % of cohort	Total Regents - % of grads	...	Regents w/o Advanced - n	Regents w/o Advanced - % of cohort	Regents w/o Advanced - % of grads	Local - n	Local - % of cohort	Local - % of grads	Still Enrolled - n	Still Enrolled - % of cohort	Dropped Out - n	Dropped Out - % of cohort
0	Total Cohort	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL	2003	5	s	s	s	s	s	...	s	s	s	s	s	s	s	s	s	s
1	Total Cohort	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL	2004	55	37	67.3%	17	30.9%	45.9%	...	17	30.9%	45.9%	20	36.4%	54.1%	15	27.3%	3	5.5%
2	Total Cohort	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL	2005	64	43	67.2%	27	42.2%	62.8%	...	27	42.2%	62.8%	16	25%	37.200000000000003%	9	14.1%	9	14.1%
3	Total Cohort	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL	2006	78	43	55.1%	36	46.2%	83.7%	...	36	46.2%	83.7%	7	9%	16.3%	16	20.5%	11	14.1%
4	Total Cohort	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL	2006 Aug	78	44	56.4%	37	47.4%	84.1%	...	37	47.4%	84.1%	7	9%	15.9%	15	19.2%	11	14.1%

5 rows × 23 columns

In [7]:

data['hs_directory'].head()

Out[7]:

	dbn	school_name	boro	building_code	phone_number	fax_number	grade_span_min	grade_span_max	expgrade_span_min	expgrade_span_max	...	priority02	priority03	priority04	priority05	priority06	priority07	priority08	priority09	priority10	Location 1
0	17K548	Brooklyn School for Music & Theatre	Brooklyn	K440	718-230-6250	718-230-6262	9	12	NaN	NaN	...	Then to New York City residents	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	883 Classon Avenue\nBrooklyn, NY 11225\n(40.67...
1	09X543	High School for Violin and Dance	Bronx	X400	718-842-0687	718-589-9849	9	12	NaN	NaN	...	Then to New York City residents who attend an ...	Then to Bronx students or residents	Then to New York City residents	NaN	NaN	NaN	NaN	NaN	NaN	1110 Boston Road\nBronx, NY 10456\n(40.8276026...
2	09X327	Comprehensive Model School Project M.S. 327	Bronx	X240	718-294-8111	718-294-8109	6	12	NaN	NaN	...	Then to Bronx students or residents who attend...	Then to New York City residents who attend an ...	Then to Bronx students or residents	Then to New York City residents	NaN	NaN	NaN	NaN	NaN	1501 Jerome Avenue\nBronx, NY 10452\n(40.84241...
3	02M280	Manhattan Early College School for Advertising	Manhattan	M520	718-935-3477	NaN	9	10	9	14.0	...	Then to New York City residents who attend an ...	Then to Manhattan students or residents	Then to New York City residents	NaN	NaN	NaN	NaN	NaN	NaN	411 Pearl Street\nNew York, NY 10038\n(40.7106...
4	28Q680	Queens Gateway to Health Sciences Secondary Sc...	Queens	Q695	718-969-3155	718-969-3552	6	12	NaN	NaN	...	Then to Districts 28 and 29 students or residents	Then to Queens students or residents	Then to New York City residents	NaN	NaN	NaN	NaN	NaN	NaN	160-20 Goethals Avenue\nJamaica, NY 11432\n(40...

5 rows × 58 columns

In [8]:

data['sat_results'].head()

Out[8]:

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29	355	404	363
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91	383	423	366
2	01M450	EAST SIDE COMMUNITY SCHOOL	70	377	402	370
3	01M458	FORSYTH SATELLITE ACADEMY	7	414	401	359
4	01M509	MARTA VALLE HIGH SCHOOL	44	390	433	384

Each data set appears to either have a DBN column, or the information we need to create one. After some processing and making DBN column in each dataset, we can combine columns of each dataset together in a single data set.

Read in the surveys¶

In [9]:

all_survey = pd.read_csv("schools/survey_all.txt", delimiter="\t", encoding='windows-1252')
d75_survey = pd.read_csv("schools/survey_d75.txt", delimiter="\t", encoding='windows-1252')
survey = pd.concat([all_survey, d75_survey], axis=0, sort=True)
survey.head()

Out[9]:

	N_p	N_s	N_t	aca_p_11	aca_s_11	aca_t_11	aca_tot_11	bn	com_p_11	com_s_11	...	t_q8c_1	t_q8c_2	t_q8c_3	t_q8c_4	t_q9	t_q9_1	t_q9_2	t_q9_3	t_q9_4	t_q9_5
0	90.0	NaN	22.0	7.8	NaN	7.9	7.9	M015	7.6	NaN	...	29.0	67.0	5.0	0.0	NaN	5.0	14.0	52.0	24.0	5.0
1	161.0	NaN	34.0	7.8	NaN	9.1	8.4	M019	7.6	NaN	...	74.0	21.0	6.0	0.0	NaN	3.0	6.0	3.0	78.0	9.0
2	367.0	NaN	42.0	8.6	NaN	7.5	8.0	M020	8.3	NaN	...	33.0	35.0	20.0	13.0	NaN	3.0	5.0	16.0	70.0	5.0
3	151.0	145.0	29.0	8.5	7.4	7.8	7.9	M034	8.2	5.9	...	21.0	45.0	28.0	7.0	NaN	0.0	18.0	32.0	39.0	11.0
4	90.0	NaN	23.0	7.9	NaN	8.1	8.0	M063	7.9	NaN	...	59.0	36.0	5.0	0.0	NaN	10.0	5.0	10.0	60.0	15.0

5 rows × 2773 columns

In [10]:

survey['dbn'].head()

Out[10]:

0    01M015
1    01M019
2    01M020
3    01M034
4    01M063
Name: dbn, dtype: object

There are two immediate facts that we can see in the data:

There are over 2000 columns, nearly all of which we don't need. We'll have to filter the data to remove the unnecessary ones. Working with fewer columns will make it easier to print the dataframe out and find correlations within it.

We'll need to filter the columns to remove the ones we don't need. Luckily, there's a data dictionary at the original data download location. The dictionary tells us what each column represents. We'll pick columns having aggregate survey data about how parents, teachers, and students feel about school safety, academic performance, and more.

The survey data has a dbn column that we'll want to convert to uppercase (DBN). The conversion will make the column name consistent with the other data sets.

In [11]:

survey["DBN"] = survey["dbn"]

survey_fields = [
    "DBN", 
    "rr_s", 
    "rr_t", 
    "rr_p", 
    "N_s", 
    "N_t", 
    "N_p", 
    "saf_p_11", 
    "com_p_11", 
    "eng_p_11", 
    "aca_p_11", 
    "saf_t_11", 
    "com_t_11", 
    "eng_t_11", 
    "aca_t_11", 
    "saf_s_11", 
    "com_s_11", 
    "eng_s_11", 
    "aca_s_11", 
    "saf_tot_11", 
    "com_tot_11", 
    "eng_tot_11", 
    "aca_tot_11",
]
survey = survey.loc[:,survey_fields]
data["survey"] = survey

data['survey'].head()

Out[11]:

	DBN	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	...	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11
0	01M015	NaN	88	60	NaN	22.0	90.0	8.5	7.6	7.5	...	7.6	7.9	NaN	NaN	NaN	NaN	8.0	7.7	7.5	7.9
1	01M019	NaN	100	60	NaN	34.0	161.0	8.4	7.6	7.6	...	8.9	9.1	NaN	NaN	NaN	NaN	8.5	8.1	8.2	8.4
2	01M020	NaN	88	73	NaN	42.0	367.0	8.9	8.3	8.3	...	6.8	7.5	NaN	NaN	NaN	NaN	8.2	7.3	7.5	8.0
3	01M034	89.0	73	50	145.0	29.0	151.0	8.8	8.2	8.0	...	6.8	7.8	6.2	5.9	6.5	7.4	7.3	6.7	7.1	7.9
4	01M063	NaN	100	60	NaN	23.0	90.0	8.7	7.9	8.1	...	7.8	8.1	NaN	NaN	NaN	NaN	8.5	7.6	7.9	8.0

5 rows × 23 columns

Add/Modify DBN columns in class_size and hs_directory¶

When we explored all of the data sets, we noticed that some of them, like class_size and hs_directory, don't have a DBN column.

hs_directory does have a dbn column, though, so we can just rename it.
However, class_size doesn't appear to have the column at all. For making up DBN column, we will add CSD and SCHOOL CODE columns and add a leading 0 to the CSD if the CSD is less than two digits long

In [12]:

data["hs_directory"]["DBN"] = data["hs_directory"]["dbn"]

def pad_csd(num):
    string_representation = str(num)
    if len(string_representation) > 1:
        return string_representation
    else:
        return "0" + string_representation
    
data["class_size"]["padded_csd"] = data["class_size"]["CSD"].apply(pad_csd)
data["class_size"]["DBN"] = data["class_size"]["padded_csd"] + data["class_size"]["SCHOOL CODE"]

print(data['hs_directory']['DBN'].head())
data["class_size"]['DBN'].head()

0    17K548
1    09X543
2    09X327
3    02M280
4    28Q680
Name: DBN, dtype: object

Out[12]:

0    01M015
1    01M015
2    01M015
3    01M015
4    01M015
Name: DBN, dtype: object

Convert numeric columns from Object type to numeric and Add Longitude and Latitude Columns¶

Before procedding further, we need to change columns in ap_2010 from object (string) data type to numeric data type.
Same needs to be done with SAT scores columns in sat_results. And then make up a new column that totals up the SAT scores.
We also want to parse the latitude and longitude coordinates for each school. This will enable us to map the schools and uncover any geographic patterns in the data. The coordinates are currently in the text field Location 1 in the hs_directory data set. We will need to use some regex alongwith string manipulation to extract them in two new columns. And then convert them to numeric format.

In [13]:

cols = ['SAT Math Avg. Score', 'SAT Critical Reading Avg. Score', 'SAT Writing Avg. Score']
for c in cols:
    data["sat_results"][c] = pd.to_numeric(data["sat_results"][c], errors="coerce")

data['sat_results']['sat_score'] = data['sat_results'][cols[0]] + data['sat_results'][cols[1]] + data['sat_results'][cols[2]]
        
cols = ['AP Test Takers ', 'Total Exams Taken', 'Number of Exams with scores 3 4 or 5']
for col in cols:
    data["ap_2010"][col] = pd.to_numeric(data["ap_2010"][col], errors="coerce")

def find_lat(loc):
    coords = re.findall("\(.+, .+\)", loc)
    lat = coords[0].split(",")[0].replace("(", "")
    return lat

def find_lon(loc):
    coords = re.findall("\(.+, .+\)", loc)
    lon = coords[0].split(",")[1].replace(")", "").strip()
    return lon

data["hs_directory"]["lat"] = data["hs_directory"]["Location 1"].apply(find_lat)
data["hs_directory"]["lon"] = data["hs_directory"]["Location 1"].apply(find_lon)

data["hs_directory"]["lat"] = pd.to_numeric(data["hs_directory"]["lat"], errors="coerce")
data["hs_directory"]["lon"] = pd.to_numeric(data["hs_directory"]["lon"], errors="coerce")

Condense datasets¶

Further, we will clean our datasets more to have a single row for each DBN value per dataset. This will help us when while combining the data sets.
We'll perform cleaning for three datasets:

class_size : each school has multiple values for GRADE, PROGRAM TYPE, CORE SUBJECT (MS CORE and 9-12 ONLY), and CORE COURSE (MS CORE and 9-12 ONLY).

- We will filter on `GRADE(09-12)` and `PROGRAM TYPE('GEN ED')` to get information about only classes where SAT has been taken. 
- And, then we will use groupby method on `DBN` to get unique groups per school. Taking average on the resulting groupby object using agg() method will give unique values per `DBN`.

demographics: Here, only column that prevents a given DBN from being unique is schoolyear. We only want to select rows where schoolyear is 20112012. This will give us the most recent year of data, and also match our SAT results data.

graduation: The Demographic and Cohort columns are what prevent DBN from being unique in the graduation data. A Cohort appears to refer to the year the data represents, and the Demographic appears to refer to a specific demographic group. In this case, we want to pick data from the most recent Cohort available, which is 2006. We also want data from the full cohort, so we'll only pick rows where Demographic is Total Cohort.

In [14]:

class_size = data["class_size"]
class_size = class_size[class_size["GRADE "] == "09-12"]
class_size = class_size[class_size["PROGRAM TYPE"] == "GEN ED"]

class_size = class_size.groupby("DBN").agg(numpy.mean)
class_size.reset_index(inplace=True)
data["class_size"] = class_size

data["demographics"] = data["demographics"][data["demographics"]["schoolyear"] == 20112012]

data["graduation"] = data["graduation"][data["graduation"]["Cohort"] == "2006"]
data["graduation"] = data["graduation"][data["graduation"]["Demographic"] == "Total Cohort"]

Combine the datasets¶

We'll merge two data sets at a time. For example, we'll merge sat_results with one dataset, then with another, then the result of that with another dataset. We'll continue combining data sets in this way until we've merged all of them. Afterwards, we'll have roughly the same number of rows, but each row will have columns from all of the data sets.
Because we're concerned with determing demographic factors correlating with SAT score, we want to preserve as many rows as possible from sat_results while minimizing null values. Therefore, we will use different merge strategies.
Data sets having lot of missing DBN values will be merged using a left join.
Data sets having DBN values identical to those in sat_results will be merged using inner join. These data sets also have information we need to keep. Most of our analysis would be impossible if a significant number of rows were missing from demographics, for example.

In [15]:

combined = data["sat_results"]

combined = combined.merge(data["ap_2010"], on="DBN", how="left")
combined = combined.merge(data["graduation"], on="DBN", how="left")

to_merge = ["class_size", "demographics", "survey", "hs_directory"]

for m in to_merge:
    combined = combined.merge(data[m], on="DBN", how="inner")

combined = combined.fillna(combined.mean())
combined = combined.fillna(0)

In [16]:

pd.set_option('display.max_columns', 500)
combined.head()

Out[16]:

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score	SchoolName	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5	Demographic	School Name	Cohort	Total Cohort	Total Grads - n	Total Grads - % of cohort	Total Regents - n	Total Regents - % of cohort	Total Regents - % of grads	Advanced Regents - n	Advanced Regents - % of cohort	Advanced Regents - % of grads	Regents w/o Advanced - n	Regents w/o Advanced - % of cohort	Regents w/o Advanced - % of grads	Local - n	Local - % of cohort	Local - % of grads	Still Enrolled - n	Still Enrolled - % of cohort	Dropped Out - n	Dropped Out - % of cohort	CSD	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	Name	schoolyear	frl_percent	total_enrollment	k	grade1	grade2	grade3	grade4	grade5	grade6	grade7	grade8	grade9	grade10	grade11	grade12	ell_num	ell_percent	sped_num	sped_percent	ctt_num	selfcontained_num	asian_num	asian_per	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	aca_p_11	saf_t_11	com_t_11	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11	dbn	school_name	boro	building_code	phone_number	fax_number	grade_span_min	grade_span_max	expgrade_span_max	bus	subway	primary_address_line_1	city	state_code	zip	website	total_students	school_type	overview_paragraph	program_highlights	language_classes	advancedplacement_courses	online_ap_courses	online_language_courses	extracurricular_activities	psal_sports_boys	psal_sports_girls	psal_sports_coed	school_sports	partner_cbo	partner_hospital	partner_highered	partner_cultural	partner_nonprofit	partner_corporate	partner_financial	partner_other	addtl_info1	addtl_info2	start_time	end_time	se_services	ell_programs	school_accessibility_description	number_programs	priority01	priority02	priority03	priority04	priority05	Location 1	lat	lon
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29	355.0	404.0	363.0	1122.0	0	129.028846	197.038462	153.45	Total Cohort	HENRY STREET SCHOOL FOR INTERNATIONAL	2006	78.0	43	55.1%	36	46.2%	83.7%	0	0%	0%	36	46.2%	83.7%	7	9%	16.3%	16	20.5%	11	14.1%	1	88.000000	4.000000	22.564286	18.500000	26.571429	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	20112012	88.6	422							32	33	50	98	79	80	50	94.0	22.3	105.0	24.9	34	35	59	14.0	123	29.1	227	53.8	7	1.7	259.0	61.4	163.0	38.6	89.0	70	39	379.000000	26.0	151.0	7.8	7.7	7.4	7.6	6.3	5.3	6.1	6.5	6.000000	5.600000	6.100000	6.700000	6.7	6.2	6.6	7.0	01M292	Henry Street School for International Studies	Manhattan	M056	212-406-9411	212-406-9417	6	12	12.0	B39, M14A, M14D, M15, M15-SBS, M21, M22, M9	B, D to Grand St ; F to East Broadway ; J, M, ...	220 Henry Street	New York	NY	10002	http://schools.nyc.gov/schoolportals/01/M292	323.0	0	Henry Street School for International Studies ...	Global/International Studies in core subjects,...	Chinese (Mandarin), Spanish	Psychology	Chinese Language and Culture, Spanish Literatu...	Chinese (Mandarin), Spanish	Math through Card Play; Art, Poetry/Spoken Wor...	Basketball	Softball	Soccer	Boxing, Track, CHAMPS, Tennis, Flag Football, ...	The Henry Street Settlement; Asia Society; Ame...	Gouverneur Hospital (Turning Points)	New York University	Asia Society	Heart of America Foundation	0	0	United Nations	0	0	8:30 AM	3:30 PM	This school will provide students with disabil...	ESL	Functionally Accessible	1	Priority to continuing 8th graders	Then to Manhattan students or residents who at...	Then to New York City residents who attend an ...	Then to Manhattan students or residents	Then to New York City residents	220 Henry Street\nNew York, NY 10002\n(40.7137...	40.713764	-73.985260
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91	383.0	423.0	366.0	1172.0	UNIVERSITY NEIGHBORHOOD H.S.	39.000000	49.000000	10.00	Total Cohort	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	2006	124.0	53	42.7%	42	33.9%	79.2%	8	6.5%	15.1%	34	27.4%	64.2%	11	8.9%	20.8%	46	37.1%	20	16.100000000000001%	1	105.687500	4.750000	22.231250	18.250000	27.062500	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	20112012	71.8	394										109	97	93	95	83.0	21.1	86.0	21.8	55	10	115	29.2	89	22.6	181	45.9	9	2.3	226.0	57.4	168.0	42.6	84.0	95	10	385.000000	37.0	46.0	7.9	7.4	7.2	7.3	6.6	5.8	6.6	7.3	6.000000	5.700000	6.300000	7.000000	6.8	6.3	6.7	7.2	01M448	University Neighborhood High School	Manhattan	M446	212-962-4341	212-267-5611	9	12	12.0	M14A, M14D, M15, M21, M22, M9	F to East Broadway ; J, M, Z to Delancey St-Es...	200 Monroe Street	New York	NY	10002	www.universityneighborhoodhs.com	299.0	0	University Neighborhood High School (UNHS) is ...	While attending UNHS, students can earn up to ...	Chinese, Spanish	Calculus AB, Chinese Language and Culture, Eng...	0	Chinese (Cantonese), Chinese (Mandarin), Spanish	Basketball, Badminton, Handball, Glee, Dance, ...	Baseball, Basketball, Bowling, Cross Country, ...	Basketball, Bowling, Cross Country, Softball, ...	0	0	Grand Street Settlement, Henry Street Settleme...	Gouverneur Hospital, The Door, The Mount Sinai...	New York University, CUNY Baruch College, Pars...	Dance Film Association, Dance Makers Film Work...	W!SE, Big Brothers Big Sisters, Peer Health Ex...	Deloitte LLP Consulting and Financial Services...	0	Movement Research	Incoming students are expected to attend schoo...	Community Service Requirement, Dress Code Requ...	8:15 AM	3:15 PM	This school will provide students with disabil...	ESL	Not Functionally Accessible	3	Open to New York City residents	For M35B only: Open only to students whose hom...	0	0	0	200 Monroe Street\nNew York, NY 10002\n(40.712...	40.712332	-73.984797
2	01M450	EAST SIDE COMMUNITY SCHOOL	70	377.0	402.0	370.0	1149.0	EAST SIDE COMMUNITY HS	19.000000	21.000000	153.45	Total Cohort	EAST SIDE COMMUNITY SCHOOL	2006	90.0	70	77.8%	67	74.400000000000006%	95.7%	0	0%	0%	67	74.400000000000006%	95.7%	3	3.3%	4.3%	15	16.7%	5	5.6%	1	57.600000	2.733333	21.200000	19.400000	22.866667	EAST SIDE COMMUNITY HIGH SCHOOL	20112012	71.8	598							92	73	76	101	93	77	86	30.0	5.0	158.0	26.4	91	19	58	9.7	143	23.9	331	55.4	62	10.4	327.0	54.7	271.0	45.3	0.0	98	28	598.208333	42.0	150.0	8.7	8.2	8.1	8.4	7.3	8.0	8.0	8.8	6.611667	6.094722	6.620278	7.381389	7.9	7.9	7.9	8.4	01M450	East Side Community School	Manhattan	M060	212-460-8467	212-260-9657	6	12	12.0	M101, M102, M103, M14A, M14D, M15, M15-SBS, M2...	6 to Astor Place ; L to 1st Ave	420 East 12 Street	New York	NY	10009	www.eschs.org	649.0	Consortium School	We are a small 6-12 secondary school that prep...	Our Advisory System ensures that we can effect...	0	Calculus AB, English Literature and Composition	0	American Sign Language, Arabic, Chinese (Manda...	After-School Tutoring, Art Portfolio Classes, ...	Baseball, Basketball, Soccer	Basketball, Soccer, Softball	0	Basketball, Bicycling, Fitness, Flag Football,...	University Settlement, Big Brothers Big Sister...	0	Columbia Teachers College, New York University...	, Internship Program, Loisaida Art Gallery loc...	College Bound Initiative, Center for Collabora...	Prudential Securities, Moore Capital, Morgan S...	0	Brooklyn Boulders (Rock Climbing)	Students present and defend their work to comm...	Our school requires an Academic Portfolio for ...	8:30 AM	3:30 PM	This school will provide students with disabil...	ESL	Not Functionally Accessible	1	Priority to continuing 8th graders	Then to New York City residents	0	0	0	420 East 12 Street\nNew York, NY 10009\n(40.72...	40.729783	-73.983041
3	01M509	MARTA VALLE HIGH SCHOOL	44	390.0	433.0	384.0	1207.0	0	129.028846	197.038462	153.45	Total Cohort	MARTA VALLE HIGH SCHOOL	2006	84.0	47	56%	40	47.6%	85.1%	17	20.2%	36.200000000000003%	23	27.4%	48.9%	7	8.300000000000001%	14.9%	25	29.8%	5	6%	1	69.642857	3.000000	23.571429	20.000000	27.357143	MARTA VALLE SECONDARY SCHOOL	20112012	80.7	367										143	100	51	73	41.0	11.2	95.0	25.9	28	36	34	9.3	116	31.6	209	56.9	6	1.6	170.0	46.3	197.0	53.7	90.0	100	21	306.000000	29.0	69.0	7.7	7.4	7.2	7.3	6.4	5.3	6.1	6.8	6.400000	5.900000	6.400000	7.000000	6.9	6.2	6.6	7.0	01M509	Marta Valle High School	Manhattan	M025	212-473-8152	212-475-7588	9	12	12.0	B39, M103, M14A, M14D, M15, M15-SBS, M21, M22,...	B, D to Grand St ; F, J, M, Z to Delancey St-E...	145 Stanton Street	New York	NY	10002	www.martavalle.org	401.0	0	Marta Valle High School (MVHS) offers a strong...	Advanced Regents Diploma, Early Graduation, up...	French, Spanish	English Literature and Composition, Studio Art...	0	Spanish	Model Peer Leadership Program, 'The Vine' Stud...	Basketball, Rugby	Rugby, Volleyball	Rugby	Volleyball, Zumba	NYCDOE Innovation Zone Lab Site, Grand Street ...	Gouvenuer's Hospital	New York University (NYU), Sarah Lawrence Coll...	Young Audiences, The National Arts Club, Educa...	College for Every Student (CFES), Morningside ...	Estée Lauder	Bank of America	CASALEAP, Beacon	Students Dress for Success, Summer Bridge to S...	Community Service Requirement, Extended Day Pr...	8:00 AM	3:30 PM	This school will provide students with disabil...	ESL	Functionally Accessible	1	Priority to District 1 students or residents	Then to Manhattan students or residents	Then to New York City residents	0	0	145 Stanton Street\nNew York, NY 10002\n(40.72...	40.720569	-73.985673
4	01M539	NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ...	159	522.0	574.0	525.0	1621.0	NEW EXPLORATIONS SCI,TECH,MATH	255.000000	377.000000	191.00	Total Cohort	NEW EXPLORATIONS INTO SCIENCE TECHNO	2006	46.0	46	100%	46	100%	100%	31	67.400000000000006%	67.400000000000006%	15	32.6%	32.6%	0	0%	0%	0	0%	0	0%	1	156.368421	6.157895	25.510526	19.473684	31.210526	NEW EXPLORATIONS INTO SCIENCE TECH AND MATH	20112012	23.0	1613	100	107	139	110	114	107	149	126	117	117	123	147	157	4.0	0.2	43.0	2.7	2	0	448	27.8	189	11.7	229	14.2	725	44.9	794.0	49.2	819.0	50.8	98.0	68	51	923.000000	67.0	736.0	8.5	7.9	7.9	8.4	7.6	5.6	5.9	7.3	7.300000	6.400000	7.000000	7.700000	7.8	6.7	6.9	7.8	01M539	New Explorations into Science, Technology and ...	Manhattan	M022	212-677-5190	212-260-8124	K	12	12.0	B39, M14A, M14D, M21, M22, M8, M9	F, J, M, Z to Delancey St-Essex St	111 Columbia Street	New York	NY	10002	www.nestmk12.net	1725.0	0	New Explorations into Science, Technology and ...	1st level science sequence - 9th grade: Regent...	Chinese (Mandarin), French, Italian, Latin, Sp...	Biology, Calculus AB, Calculus BC, Chemistry, ...	0	0	After-school Jazz Band, Annual Coffee House Co...	Basketball, Fencing, Indoor Track	Basketball, Fencing, Indoor Track	0	Badminton, Baseball, Cross-Country, Dance, Out...	7th Precinct Community Affairs, NYCWastele$$, ...	0	Hunter College, New York University, Cornell U...	VH1, Dancing Classrooms, Center for Arts Educa...	After 3	Time Warner Cable, Google, IBM, MET Project, S...	0	0	Dress Code Required: Business Casual - shirt/b...	0	8:15 AM	4:00 PM	This school will provide students with disabil...	ESL	Not Functionally Accessible	1	Priority to continuing 8th graders	Then to New York City residents	0	0	0	111 Columbia Street\nNew York, NY 10002\n(40.7...	40.718725	-73.979426

Add a school district column for mapping¶

Now, we have a clean data set on which we can base our analysis. Mapping the statistics out on a school district level might be an interesting way to analyze them. Adding a column to the data set that specifies the school district will help us accomplish this.

In [17]:

def get_first_two_chars(dbn):
    return dbn[0:2]

combined["school_dist"] = combined["DBN"].apply(get_first_two_chars)

combined.head(3)

Out[17]:

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score	SchoolName	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5	Demographic	School Name	Cohort	Total Cohort	Total Grads - n	Total Grads - % of cohort	Total Regents - n	Total Regents - % of cohort	Total Regents - % of grads	Advanced Regents - n	Advanced Regents - % of cohort	Advanced Regents - % of grads	Regents w/o Advanced - n	Regents w/o Advanced - % of cohort	Regents w/o Advanced - % of grads	Local - n	Local - % of cohort	Local - % of grads	Still Enrolled - n	Still Enrolled - % of cohort	Dropped Out - n	Dropped Out - % of cohort	CSD	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	Name	schoolyear	frl_percent	total_enrollment	grade6	grade7	grade8	grade9	grade10	grade11	grade12	ell_num	ell_percent	sped_num	sped_percent	ctt_num	selfcontained_num	asian_num	asian_per	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	aca_p_11	saf_t_11	com_t_11	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11	dbn	school_name	boro	building_code	phone_number	fax_number	grade_span_min	grade_span_max	expgrade_span_max	bus	subway	primary_address_line_1	city	state_code	zip	website	total_students	school_type	overview_paragraph	program_highlights	language_classes	advancedplacement_courses	online_ap_courses	online_language_courses	extracurricular_activities	psal_sports_boys	psal_sports_girls	psal_sports_coed	school_sports	partner_cbo	partner_hospital	partner_highered	partner_cultural	partner_nonprofit	partner_corporate	partner_other	addtl_info1	addtl_info2	start_time	end_time	se_services	ell_programs	school_accessibility_description	number_programs	priority01	priority02	priority03	priority04	priority05	Location 1	lat	lon	school_dist
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29	355.0	404.0	363.0	1122.0	0	129.028846	197.038462	153.45	Total Cohort	HENRY STREET SCHOOL FOR INTERNATIONAL	2006	78.0	43	55.1%	36	46.2%	83.7%	0	0%	0%	36	46.2%	83.7%	7	9%	16.3%	16	20.5%	11	14.1%	1	88.0000	4.000000	22.564286	18.50	26.571429	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	20112012	88.6	422	32	33	50	98	79	80	50	94.0	22.3	105.0	24.9	34	35	59	14.0	123	29.1	227	53.8	7	1.7	259.0	61.4	163.0	38.6	89.0	70	39	379.000000	26.0	151.0	7.8	7.7	7.4	7.6	6.3	5.3	6.1	6.5	6.000000	5.600000	6.100000	6.700000	6.7	6.2	6.6	7.0	01M292	Henry Street School for International Studies	Manhattan	M056	212-406-9411	212-406-9417	6	12	12.0	B39, M14A, M14D, M15, M15-SBS, M21, M22, M9	B, D to Grand St ; F to East Broadway ; J, M, ...	220 Henry Street	New York	NY	10002	http://schools.nyc.gov/schoolportals/01/M292	323.0	0	Henry Street School for International Studies ...	Global/International Studies in core subjects,...	Chinese (Mandarin), Spanish	Psychology	Chinese Language and Culture, Spanish Literatu...	Chinese (Mandarin), Spanish	Math through Card Play; Art, Poetry/Spoken Wor...	Basketball	Softball	Soccer	Boxing, Track, CHAMPS, Tennis, Flag Football, ...	The Henry Street Settlement; Asia Society; Ame...	Gouverneur Hospital (Turning Points)	New York University	Asia Society	Heart of America Foundation	0	United Nations	0	0	8:30 AM	3:30 PM	This school will provide students with disabil...	ESL	Functionally Accessible	1	Priority to continuing 8th graders	Then to Manhattan students or residents who at...	Then to New York City residents who attend an ...	Then to Manhattan students or residents	Then to New York City residents	220 Henry Street\nNew York, NY 10002\n(40.7137...	40.713764	-73.985260	01
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91	383.0	423.0	366.0	1172.0	UNIVERSITY NEIGHBORHOOD H.S.	39.000000	49.000000	10.00	Total Cohort	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	2006	124.0	53	42.7%	42	33.9%	79.2%	8	6.5%	15.1%	34	27.4%	64.2%	11	8.9%	20.8%	46	37.1%	20	16.100000000000001%	1	105.6875	4.750000	22.231250	18.25	27.062500	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	20112012	71.8	394				109	97	93	95	83.0	21.1	86.0	21.8	55	10	115	29.2	89	22.6	181	45.9	9	2.3	226.0	57.4	168.0	42.6	84.0	95	10	385.000000	37.0	46.0	7.9	7.4	7.2	7.3	6.6	5.8	6.6	7.3	6.000000	5.700000	6.300000	7.000000	6.8	6.3	6.7	7.2	01M448	University Neighborhood High School	Manhattan	M446	212-962-4341	212-267-5611	9	12	12.0	M14A, M14D, M15, M21, M22, M9	F to East Broadway ; J, M, Z to Delancey St-Es...	200 Monroe Street	New York	NY	10002	www.universityneighborhoodhs.com	299.0	0	University Neighborhood High School (UNHS) is ...	While attending UNHS, students can earn up to ...	Chinese, Spanish	Calculus AB, Chinese Language and Culture, Eng...	0	Chinese (Cantonese), Chinese (Mandarin), Spanish	Basketball, Badminton, Handball, Glee, Dance, ...	Baseball, Basketball, Bowling, Cross Country, ...	Basketball, Bowling, Cross Country, Softball, ...	0	0	Grand Street Settlement, Henry Street Settleme...	Gouverneur Hospital, The Door, The Mount Sinai...	New York University, CUNY Baruch College, Pars...	Dance Film Association, Dance Makers Film Work...	W!SE, Big Brothers Big Sisters, Peer Health Ex...	Deloitte LLP Consulting and Financial Services...	Movement Research	Incoming students are expected to attend schoo...	Community Service Requirement, Dress Code Requ...	8:15 AM	3:15 PM	This school will provide students with disabil...	ESL	Not Functionally Accessible	3	Open to New York City residents	For M35B only: Open only to students whose hom...	0	0	0	200 Monroe Street\nNew York, NY 10002\n(40.712...	40.712332	-73.984797	01
2	01M450	EAST SIDE COMMUNITY SCHOOL	70	377.0	402.0	370.0	1149.0	EAST SIDE COMMUNITY HS	19.000000	21.000000	153.45	Total Cohort	EAST SIDE COMMUNITY SCHOOL	2006	90.0	70	77.8%	67	74.400000000000006%	95.7%	0	0%	0%	67	74.400000000000006%	95.7%	3	3.3%	4.3%	15	16.7%	5	5.6%	1	57.6000	2.733333	21.200000	19.40	22.866667	EAST SIDE COMMUNITY HIGH SCHOOL	20112012	71.8	598	92	73	76	101	93	77	86	30.0	5.0	158.0	26.4	91	19	58	9.7	143	23.9	331	55.4	62	10.4	327.0	54.7	271.0	45.3	0.0	98	28	598.208333	42.0	150.0	8.7	8.2	8.1	8.4	7.3	8.0	8.0	8.8	6.611667	6.094722	6.620278	7.381389	7.9	7.9	7.9	8.4	01M450	East Side Community School	Manhattan	M060	212-460-8467	212-260-9657	6	12	12.0	M101, M102, M103, M14A, M14D, M15, M15-SBS, M2...	6 to Astor Place ; L to 1st Ave	420 East 12 Street	New York	NY	10009	www.eschs.org	649.0	Consortium School	We are a small 6-12 secondary school that prep...	Our Advisory System ensures that we can effect...	0	Calculus AB, English Literature and Composition	0	American Sign Language, Arabic, Chinese (Manda...	After-School Tutoring, Art Portfolio Classes, ...	Baseball, Basketball, Soccer	Basketball, Soccer, Softball	0	Basketball, Bicycling, Fitness, Flag Football,...	University Settlement, Big Brothers Big Sister...	0	Columbia Teachers College, New York University...	, Internship Program, Loisaida Art Gallery loc...	College Bound Initiative, Center for Collabora...	Prudential Securities, Moore Capital, Morgan S...	Brooklyn Boulders (Rock Climbing)	Students present and defend their work to comm...	Our school requires an Academic Portfolio for ...	8:30 AM	3:30 PM	This school will provide students with disabil...	ESL	Not Functionally Accessible	1	Priority to continuing 8th graders	Then to New York City residents	0	0	0	420 East 12 Street\nNew York, NY 10009\n(40.72...	40.729783	-73.983041	01

Find correlations¶

In [18]:

correlations = combined.corr()
correlations = correlations["sat_score"]
print(correlations)

SAT Critical Reading Avg. Score         0.986820
SAT Math Avg. Score                     0.972643
SAT Writing Avg. Score                  0.987771
sat_score                               1.000000
AP Test Takers                          0.523140
Total Exams Taken                       0.514333
Number of Exams with scores 3 4 or 5    0.463245
Total Cohort                            0.325144
CSD                                     0.042948
NUMBER OF STUDENTS / SEATS FILLED       0.394626
NUMBER OF SECTIONS                      0.362673
AVERAGE CLASS SIZE                      0.381014
SIZE OF SMALLEST CLASS                  0.249949
SIZE OF LARGEST CLASS                   0.314434
SCHOOLWIDE PUPIL-TEACHER RATIO               NaN
schoolyear                                   NaN
fl_percent                                   NaN
frl_percent                            -0.722225
total_enrollment                        0.367857
ell_num                                -0.153778
ell_percent                            -0.398750
sped_num                                0.034933
sped_percent                           -0.448170
asian_num                               0.475445
asian_per                               0.570730
black_num                               0.027979
black_per                              -0.284139
hispanic_num                            0.025744
hispanic_per                           -0.396985
white_num                               0.449559
                                          ...   
rr_p                                    0.047925
N_s                                     0.423463
N_t                                     0.291463
N_p                                     0.421530
saf_p_11                                0.122913
com_p_11                               -0.115073
eng_p_11                                0.020254
aca_p_11                                0.035155
saf_t_11                                0.313810
com_t_11                                0.082419
eng_t_11                                0.036906
aca_t_11                                0.132348
saf_s_11                                0.337639
com_s_11                                0.187370
eng_s_11                                0.213822
aca_s_11                                0.339435
saf_tot_11                              0.318753
com_tot_11                              0.077310
eng_tot_11                              0.100102
aca_tot_11                              0.190966
grade_span_max                               NaN
expgrade_span_max                            NaN
zip                                    -0.063977
total_students                          0.407827
number_programs                         0.117012
priority08                                   NaN
priority09                                   NaN
priority10                                   NaN
lat                                    -0.121029
lon                                    -0.132222
Name: sat_score, Length: 67, dtype: float64

Plotting survey correlations¶

There are several fields in combined that originally came from a survey of parents, teachers, and students. We will make a bar plot of the correlations between these fields and sat_score.

In [19]:

# Remove DBN since it's a unique identifier, not a useful numerical value for correlation.
survey_fields.remove("DBN")

import matplotlib.pyplot as plt
combined.corr()['sat_score'][survey_fields].plot.bar()
plt.show()

<Figure size 640x480 with 1 Axes>

Key Observations:

N_s, N_t, N_p correlate highly with sat_score. Since these fields are directly related to total enrollment, this can be understood.
rr_s (Student Response Rate) however is the more interesting point. Since, students who excel academically are more likely to respond to a survey regarding sat_score.
saf_t_11, saf_s_11, saf_tot_11 gives us another good perspective about the relationship between a safe environment and academic brilliance. Schools where Students and Teachers feel safe are more likely to have good sat_score.
aca_s_11 i.e. how students percieve academic standards correlates highly for sat_score. However, same is not true for aca_p_11 and aca_t_11, i.e. how Parents and Teachers perceive academic standards.

Exploring Safety and SAT Scores¶

We will investigating safety scores by making a scatter plot of the saf_s_11 column vs. the sat_score in combined.
Map out safety scores :
- Compute the average safety score for each district.
- Make a map that shows safety scores by district.

In [20]:

combined.plot.scatter('saf_s_11', 'sat_score')
plt.show()

From the Scatter Plot, we can observe that there seems to be a positive correlation between the sat_score and saf_s_11, although its not that strong.

In [21]:

districts = combined.groupby('school_dist').agg(numpy.mean)
districts.reset_index(inplace=True)
districts

Out[21]:

	school_dist	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score	AP Test Takers	Total Exams Taken	Number of Exams with scores 3 4 or 5	Total Cohort	CSD	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	schoolyear	frl_percent	total_enrollment	ell_num	ell_percent	sped_num	sped_percent	asian_num	asian_per	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per	rr_s	rr_t	rr_p	N_s	N_t	N_p	saf_p_11	com_p_11	eng_p_11	aca_p_11	saf_t_11	com_t_11	eng_t_11	aca_t_11	saf_s_11	com_s_11	eng_s_11	aca_s_11	saf_tot_11	com_tot_11	eng_tot_11	aca_tot_11	grade_span_max	expgrade_span_max	zip	total_students	number_programs	lat	lon
0	01	441.833333	473.333333	439.333333	1354.500000	116.681090	173.019231	135.800000	93.500000	1.0	115.244241	5.148538	22.675415	18.798392	26.553044	20112012.0	58.983333	668.500000	42.166667	10.000000	82.000000	17.083333	134.500000	17.516667	125.500000	22.333333	214.833333	40.733333	186.000000	18.450000	328.166667	50.050000	340.333333	49.950000	76.500000	85.333333	33.166667	525.368056	38.500000	239.166667	8.233333	7.800000	7.683333	8.016667	7.066667	6.233333	6.700000	7.500000	6.768611	6.165787	6.736713	7.446898	7.433333	6.816667	7.116667	7.683333	12.0	12.0	10003.166667	659.500000	1.333333	40.719022	-73.982377
1	02	426.619092	444.186256	424.832836	1295.638184	128.908454	201.516827	157.495833	158.647849	2.0	149.818949	5.686360	25.038118	20.662667	28.593861	20112012.0	63.164583	605.604167	51.625000	13.006250	70.770833	13.212500	118.395833	14.479167	141.145833	24.733333	271.458333	49.554167	68.541667	10.056250	266.500000	44.718750	339.104167	55.281250	83.312500	86.416667	38.333333	495.171007	31.687500	190.000000	8.254167	7.495833	7.379167	7.704167	7.404167	6.520833	7.120833	7.564583	6.910660	6.204057	6.650422	7.385029	7.520833	6.720833	7.037500	7.541667	12.0	12.0	10023.770833	621.395833	1.416667	40.739699	-73.991386
2	03	428.529851	437.997512	426.915672	1293.443035	156.183494	244.522436	193.087500	183.384409	3.0	156.005994	5.839200	23.716311	19.737593	27.122831	20112012.0	58.050000	661.416667	36.916667	9.025000	63.583333	14.500000	68.000000	6.408333	181.833333	34.000000	228.250000	44.533333	172.000000	13.725000	272.166667	47.950000	389.250000	52.050000	83.166667	80.833333	36.166667	519.250000	28.166667	206.416667	8.316667	7.541667	7.508333	7.758333	6.675000	5.991667	6.533333	6.991667	6.716667	6.258333	6.633333	7.375000	7.233333	6.616667	6.891667	7.391667	12.0	12.0	10023.750000	717.916667	2.000000	40.781574	-73.977370
3	04	402.142857	416.285714	405.714286	1224.142857	129.016484	183.879121	151.035714	113.857143	4.0	132.362265	5.192610	24.101048	19.740816	27.460291	20112012.0	71.000000	569.285714	21.571429	4.557143	58.571429	13.214286	61.142857	6.128571	151.714286	30.028571	341.571429	61.400000	9.285714	1.314286	193.285714	30.514286	376.000000	69.485714	87.571429	92.714286	41.285714	504.857143	32.428571	193.142857	8.385714	7.657143	7.500000	7.928571	7.871429	7.042857	7.428571	7.885714	6.885714	6.114286	6.685714	7.571429	7.714286	6.942857	7.185714	7.785714	12.0	12.0	10029.857143	580.857143	1.142857	40.793449	-73.943215
4	05	427.159915	438.236674	419.666098	1285.062687	85.722527	115.725275	142.464286	143.677419	5.0	120.623901	4.845101	24.290247	20.447527	27.746703	20112012.0	58.671429	605.857143	21.571429	4.157143	59.714286	10.428571	32.714286	7.300000	342.571429	49.000000	195.571429	36.214286	31.285714	6.714286	300.000000	48.471429	305.857143	51.528571	82.857143	79.285714	40.428571	442.428571	27.714286	184.857143	8.085714	7.342857	7.357143	7.685714	6.857143	5.885714	6.242857	6.957143	6.314286	6.000000	6.428571	7.285714	7.085714	6.414286	6.671429	7.314286	12.0	12.0	10030.142857	609.857143	1.142857	40.817077	-73.949251
5	06	382.011940	400.565672	382.066269	1164.643881	108.711538	159.715385	105.425000	180.848387	6.0	139.041709	5.285860	25.324199	20.556126	29.217872	20112012.0	84.250000	621.800000	165.700000	30.040000	80.100000	14.950000	7.800000	0.810000	82.000000	11.470000	523.900000	86.380000	5.000000	0.720000	333.600000	53.790000	288.200000	46.210000	85.000000	83.500000	54.600000	482.300000	33.000000	261.500000	8.530000	7.990000	7.750000	8.050000	7.490000	6.610000	7.060000	7.660000	6.950000	6.150000	6.700000	7.510000	7.660000	6.920000	7.170000	7.760000	12.0	12.0	10036.200000	628.900000	1.300000	40.848970	-73.932502
6	07	376.461538	380.461538	371.923077	1128.846154	73.703402	112.476331	105.276923	105.605459	7.0	97.597416	4.146610	23.560037	19.394465	27.004052	20112012.0	79.630769	481.000000	72.000000	16.661538	84.846154	17.446154	8.000000	1.707692	144.461538	29.253846	322.000000	67.607692	3.769231	0.800000	261.769231	53.138462	219.230769	46.861538	80.538462	77.153846	42.923077	376.230769	25.846154	187.615385	8.423077	7.761538	7.600000	7.907692	6.969231	6.315385	6.784615	7.284615	6.800000	6.184615	6.646154	7.453846	7.392308	6.746154	7.007692	7.553846	12.0	12.0	10452.692308	465.846154	1.461538	40.816815	-73.919971
7	08	386.214383	395.542741	377.908005	1159.665129	118.379371	168.020979	144.731818	215.510264	8.0	129.765099	5.165710	23.813192	19.385355	27.362141	20112012.0	76.663636	684.545455	72.909091	15.790909	140.545455	19.609091	30.272727	2.809091	179.818182	27.809091	433.545455	66.109091	36.181818	2.554545	355.000000	45.454545	329.545455	54.545455	69.454545	77.636364	36.636364	459.837121	34.000000	148.545455	8.172727	7.636364	7.454545	7.718182	7.109091	6.254545	6.781818	7.390909	6.364697	5.954066	6.401843	7.143763	7.218182	6.645455	6.863636	7.409091	12.0	12.0	10467.000000	547.636364	1.272727	40.823803	-73.866087
8	09	373.755970	383.582836	374.633134	1131.971940	71.411538	104.265385	98.470000	113.330645	9.0	100.118588	4.335504	23.986379	19.543005	28.447128	20112012.0	76.800000	442.800000	78.900000	19.945000	74.150000	16.655000	7.000000	1.480000	151.900000	33.110000	278.250000	64.135000	3.000000	0.665000	220.700000	48.995000	222.050000	50.995000	84.200000	84.650000	40.650000	349.150000	25.450000	162.300000	8.510000	7.930000	7.755000	8.080000	6.935000	6.305000	6.905000	7.355000	6.655000	6.145000	6.705000	7.490000	7.365000	6.795000	7.110000	7.665000	12.0	12.0	10456.100000	449.700000	1.150000	40.836349	-73.906240
9	10	403.363636	418.000000	400.863636	1222.227273	132.231206	226.914336	191.618182	161.318182	10.0	168.876526	6.512691	24.244350	18.897297	28.583860	20112012.0	72.268182	818.545455	129.090909	18.177273	99.045455	13.622727	119.681818	6.631818	169.090909	23.027273	453.545455	63.272727	71.409091	6.527273	413.181818	48.586364	405.363636	51.413636	82.727273	82.136364	43.000000	648.772727	43.500000	303.590909	8.272727	7.768182	7.622727	7.927273	6.972727	6.154545	6.700000	7.309091	6.577273	6.040909	6.654545	7.409091	7.268182	6.645455	6.986364	7.536364	12.0	12.0	10463.181818	757.863636	1.500000	40.870345	-73.898360
10	11	389.866667	394.533333	380.600000	1165.000000	83.813462	122.484615	108.833333	122.866667	11.0	129.031031	5.037052	24.956523	19.784469	29.523549	20112012.0	66.960000	562.466667	61.666667	12.133333	105.600000	17.380000	19.800000	3.846667	243.000000	41.346667	278.266667	50.846667	16.400000	3.113333	319.866667	55.900000	242.600000	44.100000	80.400000	79.933333	42.466667	400.733333	27.466667	190.800000	8.053333	7.693333	7.546667	7.826667	7.026667	6.766667	7.000000	7.540000	6.186667	5.746667	6.260000	7.213333	7.086667	6.740000	6.940000	7.526667	12.0	12.0	10467.933333	563.666667	1.533333	40.873138	-73.856120
11	12	364.769900	379.109453	357.943781	1101.823134	93.102564	139.442308	153.450000	110.467742	12.0	91.684504	4.151865	22.212078	18.487312	25.804834	20112012.0	81.733333	391.000000	105.500000	27.141667	70.666667	17.216667	5.500000	1.591667	109.750000	28.075000	271.750000	69.275000	3.000000	0.825000	199.916667	50.133333	191.083333	49.866667	80.000000	79.750000	49.166667	277.333333	21.166667	163.083333	8.650000	8.008333	7.808333	8.166667	7.266667	6.758333	7.250000	7.816667	7.116667	6.566667	7.116667	7.783333	7.666667	7.100000	7.391667	7.916667	12.0	12.0	10463.166667	409.000000	1.083333	40.831412	-73.886946
12	13	409.393800	424.127440	403.666361	1237.187600	232.931953	382.704142	320.773077	224.595533	13.0	218.306055	8.003842	24.621784	19.625241	28.672265	20112012.0	63.861538	902.000000	35.461538	8.723077	65.692308	12.084615	272.461538	10.453846	416.000000	68.476923	110.000000	16.276923	96.230769	3.623077	482.769231	48.707692	419.230769	51.292308	84.153846	77.615385	35.769231	759.846154	40.000000	318.384615	8.330769	7.700000	7.692308	7.884615	7.092308	6.830769	7.069231	7.600000	6.407692	6.069231	6.576923	7.376923	7.269231	6.869231	7.123077	7.607692	12.0	12.0	11207.153846	895.153846	2.076923	40.692865	-73.977016
13	14	395.937100	398.189765	385.333049	1179.459915	77.798077	114.873626	123.282143	112.347926	14.0	123.643728	4.905127	24.311291	19.554945	28.511594	20112012.0	75.928571	556.214286	57.785714	9.485714	97.714286	18.600000	19.500000	3.871429	222.428571	42.228571	291.928571	49.664286	18.714286	3.507143	324.714286	58.050000	231.500000	41.942857	77.714286	85.500000	35.071429	395.642857	32.428571	158.214286	8.357143	7.864286	7.735714	7.971429	7.271429	6.650000	7.257143	7.735714	6.685714	6.207143	6.742857	7.507143	7.435714	6.914286	7.250000	7.750000	12.0	12.0	11210.785714	545.357143	2.000000	40.711599	-73.948360
14	15	395.679934	404.628524	390.295854	1190.604312	94.574786	141.581197	153.450000	104.207885	15.0	135.707319	5.236588	25.423643	20.900953	29.287015	20112012.0	70.800000	562.666667	48.888889	8.844444	103.333333	17.866667	26.333333	4.966667	251.777778	47.000000	246.111111	41.266667	36.444444	6.377778	270.666667	48.211111	292.000000	51.788889	81.444444	82.888889	32.333333	433.000000	34.777778	170.000000	8.077778	7.577778	7.477778	7.844444	6.566667	5.877778	6.288889	7.011111	6.177778	5.900000	6.333333	7.222222	6.933333	6.444444	6.700000	7.366667	12.0	12.0	11214.222222	573.111111	1.666667	40.675972	-73.989255
15	16	371.529851	379.164179	369.415672	1120.109701	82.264423	126.519231	153.450000	247.185484	16.0	177.501282	6.970513	24.234006	18.070192	29.308333	20112012.0	66.150000	658.500000	18.750000	3.725000	131.750000	19.775000	4.750000	1.075000	562.000000	80.150000	80.750000	16.900000	4.500000	0.925000	377.750000	53.400000	280.750000	46.600000	68.500000	75.250000	19.500000	458.000000	31.500000	122.000000	7.375000	6.900000	6.875000	7.075000	5.750000	5.250000	6.025000	6.275000	5.875000	5.775000	6.225000	7.100000	6.325000	5.975000	6.375000	6.825000	12.0	12.0	11219.000000	440.250000	1.750000	40.688008	-73.929686
16	17	386.571429	394.071429	380.785714	1161.428571	105.583791	163.087912	111.360714	121.357143	17.0	130.246192	5.135694	25.408854	20.284341	29.620046	20112012.0	67.635714	554.285714	52.357143	11.285714	54.571429	10.621429	16.857143	3.607143	474.285714	83.071429	50.642857	10.664286	9.000000	2.050000	242.857143	48.000000	311.428571	52.000000	78.785714	77.714286	41.214286	426.428571	27.357143	209.857143	7.835714	7.642857	7.550000	7.828571	6.957143	6.600000	7.192857	7.642857	6.007143	5.885714	6.400000	7.228571	6.935714	6.707143	7.035714	7.571429	12.0	12.0	11220.642857	547.071429	1.642857	40.660313	-73.955636
17	18	373.454545	373.090909	371.454545	1118.000000	129.028846	197.038462	153.450000	72.771261	18.0	72.209438	3.317955	22.102538	18.814864	25.448084	20112012.0	65.900000	332.272727	29.545455	8.754545	53.363636	16.363636	4.181818	1.236364	290.272727	87.327273	32.363636	9.718182	3.727273	1.163636	164.909091	51.227273	167.363636	48.772727	78.181818	83.363636	31.454545	222.181818	18.454545	89.181818	8.272727	7.718182	7.572727	7.963636	6.890909	6.400000	6.936364	7.390909	6.190909	6.018182	6.409091	7.209091	7.109091	6.718182	6.963636	7.527273	12.0	12.0	11224.000000	344.000000	1.090909	40.641863	-73.914726
18	19	367.083333	377.583333	359.166667	1103.833333	88.097756	124.769231	120.670833	114.322581	19.0	105.752625	4.247924	24.314442	19.921011	28.148996	20112012.0	71.833333	492.500000	62.833333	14.941667	75.666667	15.291667	15.750000	3.191667	285.000000	55.066667	182.666667	39.916667	5.833333	1.150000	284.500000	53.725000	208.000000	46.275000	70.833333	89.166667	34.083333	311.916667	28.500000	141.333333	7.591667	7.575000	7.375000	7.658333	6.516667	6.400000	6.766667	7.308333	6.150000	5.933333	6.458333	7.141667	6.758333	6.641667	6.866667	7.350000	12.0	12.0	11207.500000	440.416667	1.916667	40.676547	-73.882158
19	20	406.223881	465.731343	401.732537	1273.687761	227.805769	359.407692	177.690000	591.374194	20.0	420.029766	14.721019	25.423366	17.681562	29.745831	20112012.0	65.840000	2462.600000	601.400000	20.800000	341.200000	15.140000	844.600000	32.920000	186.800000	12.620000	766.000000	32.300000	658.200000	21.840000	1345.200000	44.920000	1117.400000	55.080000	64.400000	78.200000	25.000000	1360.200000	107.000000	602.400000	7.920000	7.180000	7.240000	7.460000	7.820000	7.660000	7.800000	8.280000	7.120000	6.340000	7.040000	7.640000	7.620000	7.080000	7.380000	7.800000	12.0	12.0	11210.200000	2521.400000	3.800000	40.626751	-74.006191
20	21	395.283582	421.786974	389.242062	1206.312619	135.467657	203.835664	142.377273	275.351906	21.0	224.702989	8.191627	25.284294	20.167110	28.885220	20112012.0	62.227273	1160.000000	160.909091	17.163636	157.363636	14.454545	262.090909	17.272727	364.909091	33.409091	230.454545	23.409091	297.454545	25.527273	616.090909	58.172727	543.909091	41.827273	79.272727	89.181818	34.545455	930.545455	64.454545	336.181818	7.818182	7.336364	7.263636	7.527273	7.072727	6.400000	6.963636	7.472727	6.390909	6.081818	6.500000	7.163636	7.090909	6.609091	6.909091	7.390909	12.0	12.0	11221.000000	1098.272727	3.272727	40.593596	-73.978465
21	22	473.500000	502.750000	474.250000	1450.500000	391.007212	614.509615	370.362500	580.250000	22.0	495.279369	15.777391	30.077636	22.509675	33.464740	20112012.0	44.225000	2142.500000	138.250000	4.400000	166.250000	6.650000	507.750000	17.975000	622.250000	35.100000	271.000000	12.350000	732.000000	33.975000	1015.750000	44.300000	1126.750000	55.700000	90.250000	87.500000	48.750000	1887.250000	89.500000	1106.250000	8.300000	7.325000	7.525000	7.775000	8.175000	7.400000	7.600000	8.200000	7.025000	6.050000	6.775000	7.500000	7.850000	6.900000	7.300000	7.850000	12.0	12.0	11223.000000	2149.000000	2.250000	40.618285	-73.952288
22	23	380.666667	398.666667	378.000000	1157.333333	29.000000	31.000000	153.450000	87.000000	23.0	120.113095	4.985119	24.299702	18.654762	28.994048	20112012.0	70.400000	503.666667	8.000000	1.600000	68.666667	14.466667	6.000000	1.133333	438.000000	87.133333	53.333333	10.600000	3.333333	0.566667	221.333333	44.433333	282.333333	55.566667	84.000000	73.666667	40.666667	420.333333	25.333333	175.000000	7.900000	7.533333	7.566667	7.700000	7.033333	6.433333	6.666667	7.366667	6.233333	6.066667	6.766667	7.566667	7.066667	6.700000	7.000000	7.566667	12.0	12.0	11219.000000	391.000000	1.333333	40.668586	-73.912298
23	24	405.846154	434.000000	402.153846	1242.000000	126.474852	179.094675	115.165385	234.682382	24.0	213.471903	8.613529	23.865936	18.565040	28.251471	20112012.0	62.584615	991.000000	179.769231	21.069231	97.076923	9.607692	182.692308	17.176923	74.769231	7.800000	616.615385	63.415385	113.461538	11.284615	572.769231	51.646154	418.230769	48.353846	83.846154	83.230769	35.692308	765.461538	47.769231	247.692308	8.461538	7.684615	7.569231	7.869231	7.669231	6.569231	7.230769	7.723077	7.107692	6.169231	6.784615	7.500000	7.753846	6.807692	7.192308	7.707692	12.0	12.0	11206.153846	962.461538	2.230769	40.740621	-73.911518
24	25	437.250000	483.500000	436.250000	1357.000000	205.260817	279.889423	174.793750	268.733871	25.0	280.576007	9.990385	27.106270	21.213189	31.371220	20112012.0	52.950000	1356.750000	257.500000	19.737500	136.750000	9.975000	465.250000	40.600000	240.375000	13.062500	494.000000	29.237500	148.875000	16.537500	669.500000	48.500000	687.250000	51.500000	87.000000	78.500000	37.625000	1028.250000	56.000000	370.875000	8.175000	7.537500	7.412500	7.687500	7.487500	6.825000	7.200000	7.712500	6.912500	6.150000	6.712500	7.400000	7.525000	6.837500	7.112500	7.612500	12.0	12.0	11361.000000	1288.875000	1.875000	40.745414	-73.815558
25	26	445.200000	487.600000	444.800000	1377.600000	410.605769	632.407692	392.090000	825.600000	26.0	595.953216	19.240117	29.971140	20.315789	33.554327	20112012.0	43.800000	2991.600000	248.600000	7.480000	350.000000	11.960000	1259.200000	38.140000	676.400000	28.420000	604.800000	19.380000	432.400000	13.420000	1475.000000	49.220000	1516.600000	50.780000	67.000000	83.400000	23.800000	1930.800000	129.800000	684.200000	7.720000	6.980000	7.220000	7.260000	7.000000	6.580000	6.840000	7.260000	6.760000	6.060000	6.660000	7.380000	7.140000	6.540000	6.900000	7.300000	12.0	12.0	11388.600000	2837.400000	4.600000	40.748507	-73.759176
26	27	407.800000	422.200000	394.300000	1224.300000	100.611538	145.315385	95.125000	288.961290	27.0	249.324536	9.019043	26.649659	20.757535	30.938081	20112012.0	61.340000	1148.200000	117.800000	6.820000	134.600000	11.950000	263.100000	16.890000	365.700000	39.760000	394.100000	31.380000	103.900000	10.750000	610.700000	52.400000	537.500000	47.600000	76.900000	86.500000	40.900000	775.100000	59.200000	276.800000	7.830000	7.400000	7.370000	7.640000	6.930000	6.780000	7.060000	7.630000	6.390000	6.000000	6.500000	7.310000	7.010000	6.710000	6.950000	7.510000	12.0	12.0	11556.300000	1072.000000	2.500000	40.638828	-73.807823
27	28	445.941655	465.997286	435.908005	1347.846947	182.010490	273.559441	175.336364	351.214076	28.0	255.381164	8.719058	26.770807	21.349816	30.172038	20112012.0	53.790909	1210.636364	87.363636	4.963636	123.454545	9.827273	403.727273	31.345455	343.727273	37.509091	283.454545	19.954545	165.636364	9.400000	590.363636	45.863636	620.272727	54.136364	85.090909	85.545455	40.545455	872.545455	54.727273	362.636364	8.172727	7.590909	7.472727	7.781818	7.600000	7.145455	7.363636	7.972727	6.627273	6.081818	6.618182	7.454545	7.463636	6.936364	7.145455	7.745455	12.0	12.0	11422.000000	1304.272727	2.545455	40.709344	-73.806367
28	29	395.764925	399.457090	386.707836	1181.929851	63.385817	96.514423	135.268750	98.108871	29.0	88.372155	3.454193	25.355632	21.409201	28.672730	20112012.0	52.362500	428.250000	12.625000	2.912500	56.125000	13.037500	20.750000	4.825000	359.875000	83.987500	39.000000	9.150000	4.125000	1.050000	219.875000	51.862500	208.375000	48.137500	77.625000	90.625000	44.750000	294.500000	21.375000	146.500000	7.762500	7.362500	7.325000	7.562500	7.050000	6.912500	7.162500	7.575000	6.075000	5.962500	6.487500	7.250000	6.962500	6.737500	7.000000	7.475000	12.0	12.0	11413.625000	474.125000	1.250000	40.685276	-73.752740
29	30	430.679934	465.961857	429.740299	1326.382090	157.231838	252.123932	115.150000	310.526882	30.0	251.803744	9.260486	25.715644	19.312273	29.760726	20112012.0	54.211111	1221.555556	220.555556	16.133333	125.111111	7.988889	281.777778	24.866667	124.222222	10.444444	607.222222	43.522222	204.444444	20.788889	604.111111	44.777778	617.444444	55.222222	89.000000	85.444444	42.333333	963.444444	63.111111	318.777778	8.222222	7.444444	7.433333	7.700000	7.500000	6.655556	6.911111	7.533333	7.033333	6.166667	6.844444	7.511111	7.577778	6.744444	7.077778	7.588889	12.0	12.0	11103.000000	1123.333333	2.555556	40.755398	-73.932306
30	31	457.500000	472.500000	452.500000	1382.500000	228.908654	355.111538	194.435000	450.787097	31.0	380.528319	13.251284	28.119729	19.995494	32.785641	20112012.0	38.310000	1850.600000	64.200000	2.860000	301.500000	16.710000	177.700000	10.300000	307.300000	18.270000	418.700000	22.170000	938.700000	48.810000	948.500000	52.090000	902.100000	47.910000	88.900000	91.300000	43.500000	1476.600000	88.900000	593.300000	7.800000	7.390000	7.460000	7.620000	7.210000	7.140000	7.390000	7.860000	6.530000	6.070000	6.780000	7.330000	7.200000	6.870000	7.200000	7.610000	12.0	12.0	10307.100000	1847.500000	5.000000	40.595680	-74.125726
31	32	371.500000	385.833333	362.166667	1119.500000	70.342949	100.179487	83.558333	105.333333	32.0	100.525613	4.450572	22.479804	17.936520	26.816013	20112012.0	82.866667	420.833333	79.500000	18.533333	67.500000	15.650000	6.333333	1.750000	82.000000	19.833333	326.166667	76.850000	3.833333	0.966667	217.833333	51.716667	203.000000	48.283333	75.666667	72.333333	34.666667	312.833333	22.333333	137.833333	8.350000	8.050000	7.883333	8.216667	7.050000	6.166667	6.900000	7.483333	6.766667	6.033333	6.783333	7.483333	7.383333	6.750000	7.200000	7.733333	12.0	12.0	11231.666667	381.500000	1.000000	40.696295	-73.917124

In [61]:

import numpy as np 
from mpl_toolkits.basemap import Basemap
m = Basemap(
    projection='merc', 
    llcrnrlat=40.496044, 
    urcrnrlat=40.915256, 
    llcrnrlon=-74.255735, 
    urcrnrlon=-73.700272,
    resolution='f'
)

m.drawmapboundary(fill_color='#ABD1FF')
m.drawcoastlines(color='black', linewidth=.4)
m.drawrivers(color='#ABD1FF', linewidth=.4)
m.fillcontinents(color= '#F2F1EF',lake_color='#ABD1FF')
longitudes = districts['lon'].tolist()
latitudes = districts['lat'].tolist()

m.scatter(longitudes, latitudes, s=50, zorder=2, latlon=True, c=districts['saf_tot_11'], cmap='viridis')
plt.axis(aspect='equal')
plt.xlabel('longitude')
plt.ylabel('latitude')
plt.colorbar(label='Safety Score')
plt.title('New York City: Schools Safety Score')
plt.savefig('new_york_schools.png')
plt.show()

From the Map, we can observe that parts of Manhattan, Bronx and Queens have relatively higher safety scores. While Brooklyn has mostly lower safety scores.

Racial differences in SAT Scores¶

There are a few columns that indicate the percentage of each race at a given school:

white_per
asian_per
black_per
hispanic_per

By plotting out the correlations between these columns and sat_score, we can determine whether there are any racial differences in SAT performance. We will investigate racial differences in SAT scores by making a bar plot of the correlations between the columns above and sat_score.

In [23]:

races_per = ['white_per', 'asian_per', 'black_per', 'hispanic_per']
combined.corr()['sat_score'][races_per].plot.bar()
plt.show()

From the bar plot above we can notice that higher percentage of White or Asian students at a school correlates with higher SAT Scores. And vice versa for Black and Hispanic Students. It can be due to various external factors, like economic background of students, funding to Schools, etc.

We will continue exploration regarding race and SAT scores by makign a scatter plot of hispanic_per vs. sat_score.

In [24]:

combined.plot.scatter(x='hispanic_per', y='sat_score')
plt.show()

There seems to be a negative correlation between hispanic students percentage and SAT Score. This may be due to a lack of funding for schools in certain areas, which are more likely to have a higher percentage of black or hispanic students.

Filtering down on hispanic population by researching schools with a hispanic_per greater than 95%.

In [25]:

combined[combined['hispanic_per']>95]['SCHOOL NAME']

Out[25]:

44                         MANHATTAN BRIDGES HIGH SCHOOL
82      WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL
89     GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...
125                  ACADEMY FOR LANGUAGE AND TECHNOLOGY
141                INTERNATIONAL SCHOOL FOR LIBERAL ARTS
176     PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE
253                            MULTICULTURAL HIGH SCHOOL
286               PAN AMERICAN INTERNATIONAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

We can observe from our findings that the above schools are catering mostly to the immigrant community, who have recently migrated to USA from neighboring hispanic countries. And are learning English, which can be correlated to lower SAT Scores.

Researching schools with a hispanic_per less than 10% and an average SAT score greater than 1800.

In [26]:

combined[(combined['hispanic_per'] < 10) & (combined['sat_score'] > 1800)]['SCHOOL NAME']

Out[26]:

37                                STUYVESANT HIGH SCHOOL
151                         BRONX HIGH SCHOOL OF SCIENCE
187                       BROOKLYN TECHNICAL HIGH SCHOOL
327    QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...
356                  STATEN ISLAND TECHNICAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

From our findings, we observe that the above schools are sepcialized in Science & Technology. Plus, have a criteria (clearing specific entrance exams) for admitting students, which explaining the High SAT Scores. But, there is no specific reason for low Hispanic proportion of students.

Gender differences in SAT Scores¶

There are two columns that indicate the percentage of each gender at a school:

male_per
female_per

We can plot out the correlations between each percentage and sat_score by making a bar plot.

In [27]:

gender_per = ['male_per', 'female_per']
combined.corr()['sat_score'][gender_per].plot.bar()
plt.show()

Here, from the bar plot we can observe that higher percentage of Female Students correlates positively with SAT Scores and vice versa for Male Students. However neither of the correlations are strong.

Investigating schools with high SAT scores and a high female_per by making a scatter plot of female_per vs. sat_score.

In [28]:

combined.plot.scatter(x='female_per', y='sat_score')
plt.show()

From the scatter plot above, there doesn't seem to be any strong correlation among the SAT Scores and Female Percentage.

Researching schools with a female_per greater than 60% and an average SAT score greater than 1700.

In [29]:

combined[(combined['sat_score'] > 1700) & (combined['female_per'] > 60)]['SCHOOL NAME']

Out[29]:

5                         BARD HIGH SCHOOL EARLY COLLEGE
26                         ELEANOR ROOSEVELT HIGH SCHOOL
60                                    BEACON HIGH SCHOOL
61     FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...
302                          TOWNSEND HARRIS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

From our findings, we notice that the above schools have their major focus primarily on two things:

Arts

Preparing students for college

These appear to be very selective liberal arts schools that have high academic standards.

Conclusion:¶

There isn't enough evidence to claim that the SAT scores are unfair to certain groups. On surface, it looks like the SAT Scores are unfair. But on further investigation, we can see that there are multiple other factors (like immigration, funding, etc.) which may influence academic performance.

In [ ]: