Are SAT's fair ?

The aim of this project is to explore the relationship between SAT scores and demographic(race, gender,etc.) factors of NYC high schools

Dataset Description

New York city has a very diverse population so comparing SAT score against demographic factors can help us figure out whether SAT's are fair or not.

SAT is a Scholastic Aptitude Test that high school seniors in the US take every year. This test has three sections each of which has maximum of 800 points. A high SAT score usually means that school is good.Note: that we are discussing the old SAT exam pattern here the new one is different and you can read more about the new one in this link

Dataset Links Descriptions
SAT Results SAT results of New York High Schools
School Attendence Attendence details of a school
Class size Information on class size for each school
AP test results data Advanced Placement exam test results of each high school. Helps students to earn college credit.
Graduation details Shows the percentage of students who graduated. Contains graduation details for multiple years of every school
Demographic details Demographic info of every school in NYC
School Survey results Survey of parents teachers and students of each school

Read in the data

Lets start of our analysis by readinng the datasets.

In [1]:
#Importing all the necessary libraries
import pandas as pd
import numpy
import re
import matplotlib.pyplot as plt
import matplotlib.style as style
%matplotlib inline
data_files = [
    "ap_2010.csv",
    "class_size.csv",
    "demographics.csv",
    "graduation.csv",
    "hs_directory.csv",
    "sat_results.csv"
]

data = {}

for f in data_files:
    d = pd.read_csv("schools/{0}".format(f))
    data[f.replace(".csv", "")] = d

Read in the surveys

In [2]:
#The survey files have a different form of encoding
all_survey = pd.read_csv("schools/survey_all.txt", delimiter="\t", encoding='windows-1252')
d75_survey = pd.read_csv("schools/survey_d75.txt", delimiter="\t", encoding='windows-1252')
survey = pd.concat([all_survey, d75_survey], axis=0)

survey["DBN"] = survey["dbn"]

survey_fields = [
    "DBN", 
    "rr_s", 
    "rr_t", 
    "rr_p", 
    "N_s", 
    "N_t", 
    "N_p", 
    "saf_p_11", 
    "com_p_11", 
    "eng_p_11", 
    "aca_p_11", 
    "saf_t_11", 
    "com_t_11", 
    "eng_t_11", 
    "aca_t_11", 
    "saf_s_11", 
    "com_s_11", 
    "eng_s_11", 
    "aca_s_11", 
    "saf_tot_11", 
    "com_tot_11", 
    "eng_tot_11", 
    "aca_tot_11",
]
survey = survey.loc[:,survey_fields]
data["survey"] = survey
data["survey"].head()
Out[2]:
DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 ... eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 ... 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 ... 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 ... 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 ... 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 ... 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

5 rows × 23 columns

The DBN column is a unique identifier in the survey and school datasets. We need to make sure that this column is present in all the datasets.

Add DBN columns

These datasets don't have DBN column. In the hs_directory dataset the DBN column is present as dbn so we just have to rename it. However, in the class_size dataset it is present as a combination of different columns.

In [3]:
data["hs_directory"]["DBN"] = data["hs_directory"]["dbn"]
data["class_size"].head()
Out[3]:
CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN

The DBN column here seems to be a combination of CSD and School code. However, we would need to pad CSD with 0 inorder to make it a two digit number.

In [4]:
#Converting single digit numbers to two digit by padding it with 0
data["class_size"]["padded_csd"] = data["class_size"]["CSD"].apply(lambda x:str(x).zfill(2))
data["class_size"]["DBN"] = data["class_size"]["padded_csd"] + data["class_size"]["SCHOOL CODE"]

Convert columns to numeric

Converting SAT component score columns in the sat_results dataset to numeric so that we can combine them and get SAT total score, since we are only interested in the total SAT score.

In [5]:
cols = ['SAT Math Avg. Score', 'SAT Critical Reading Avg. Score', 'SAT Writing Avg. Score']
for c in cols:
    data["sat_results"][c] = pd.to_numeric(data["sat_results"][c], errors="coerce")

data['sat_results']['sat_score'] = data['sat_results'][cols[0]] + data['sat_results'][cols[1]] + data['sat_results'][cols[2]]

Figuring out precise locations

We will use the hs_directory column to get the precise information on the locations of the schools. Precise locations can help us map the schools precisely and uncover any geographical patterns.

In [6]:
print (data['hs_directory']['Location 1'][1])
1110 Boston Road
Bronx, NY 10456
(40.8276026690005, -73.90447525699966)

Need to extract the latitude and longitude using regex.

In [7]:
#Functions for extracting lattitude and longitude from dataset hs_directory column Location 1.
def find_lat(loc):
    coords = re.findall("\(.+, .+\)", loc)
    lat = coords[0].split(",")[0].replace("(", "")
    return lat

def find_lon(loc):
    coords = re.findall("\(.+, .+\)", loc)
    lon = coords[0].split(",")[1].replace(")", "").strip()
    return lon

data["hs_directory"]["lat"] = data["hs_directory"]["Location 1"].apply(find_lat)
data["hs_directory"]["lon"] = data["hs_directory"]["Location 1"].apply(find_lon)

data["hs_directory"]["lat"] = pd.to_numeric(data["hs_directory"]["lat"], errors="coerce")
data["hs_directory"]["lon"] = pd.to_numeric(data["hs_directory"]["lon"], errors="coerce")

Condense datasets

The datasets contain alot of information so we will try to condense them.

In [8]:
class_size = data["class_size"]
class_size.head(2)
Out[8]:
CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO padded_csd DBN
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN 01 01M015
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN 01 01M015
In [9]:
class_size["GRADE "].value_counts()
Out[9]:
09-12      10644
MS Core     4762
0K-09       1384
0K          1237
01          1185
02          1167
03          1143
04          1140
05          1086
06           846
07           778
08           735
09            20
Name: GRADE , dtype: int64
In [10]:
class_size["PROGRAM TYPE"].value_counts()
Out[10]:
GEN ED     14545
CTT         7460
SPEC ED     3653
G&T          469
Name: PROGRAM TYPE, dtype: int64
In [11]:
class_size = class_size[class_size["GRADE "] == "09-12"]
class_size = class_size[class_size["PROGRAM TYPE"] == "GEN ED"]

class_size = class_size.groupby("DBN").agg(numpy.mean)
class_size.reset_index(inplace=True)
data["class_size"] = class_size
In [12]:
data["demographics"].head()
Out[12]:
DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 ... black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
0 01M015 P.S. 015 ROBERTO CLEMENTE 20052006 89.4 NaN 281 15 36 40 33 ... 74 26.3 189 67.3 5 1.8 158.0 56.2 123.0 43.8
1 01M015 P.S. 015 ROBERTO CLEMENTE 20062007 89.4 NaN 243 15 29 39 38 ... 68 28.0 153 63.0 4 1.6 140.0 57.6 103.0 42.4
2 01M015 P.S. 015 ROBERTO CLEMENTE 20072008 89.4 NaN 261 18 43 39 36 ... 77 29.5 157 60.2 7 2.7 143.0 54.8 118.0 45.2
3 01M015 P.S. 015 ROBERTO CLEMENTE 20082009 89.4 NaN 252 17 37 44 32 ... 75 29.8 149 59.1 7 2.8 149.0 59.1 103.0 40.9
4 01M015 P.S. 015 ROBERTO CLEMENTE 20092010 96.5 208 16 40 28 32 ... 67 32.2 118 56.7 6 2.9 124.0 59.6 84.0 40.4

5 rows × 38 columns

In [13]:
data["demographics"] = data["demographics"][data["demographics"]["schoolyear"] == 20112012]

data["graduation"] = data["graduation"][data["graduation"]["Cohort"] == "2006"]
data["graduation"] = data["graduation"][data["graduation"]["Demographic"] == "Total Cohort"]

Convert AP scores to numeric

In [14]:
cols = ['AP Test Takers ', 'Total Exams Taken', 'Number of Exams with scores 3 4 or 5']

for col in cols:
    data["ap_2010"][col] = pd.to_numeric(data["ap_2010"][col], errors="coerce")

Combine the datasets

Now we need to combine the cleaned datasets into one for analysis.

  • Since DBN column is a unique identifier present in all of our datasets it will act as our key.
  • Since we are interested in SAT scores so we will consider the sat_results column as the base for our analysis and we will add other columns to it.
In [15]:
combined = data["sat_results"]

combined = combined.merge(data["ap_2010"], on="DBN", how="left")
combined = combined.merge(data["graduation"], on="DBN", how="left")

to_merge = ["class_size", "demographics", "survey", "hs_directory"]

for m in to_merge:
    combined = combined.merge(data[m], on="DBN", how="inner")

combined = combined.fillna(combined.mean())
combined = combined.fillna(0)

Add a school district column for mapping

In [16]:
#THe first two characters of dbn represent the school district number
combined["school_dist"] = combined["DBN"].apply(lambda x : x[:2])

Find correlations

In [17]:
correlations = combined.corr()
correlations = correlations["sat_score"]
print(correlations)
SAT Critical Reading Avg. Score    0.986820
SAT Math Avg. Score                0.972643
SAT Writing Avg. Score             0.987771
sat_score                          1.000000
AP Test Takers                     0.523140
                                     ...   
priority08                              NaN
priority09                              NaN
priority10                              NaN
lat                               -0.121029
lon                               -0.132222
Name: sat_score, Length: 67, dtype: float64

Plotting survey correlations

In [18]:
# Remove DBN since it's a unique identifier, not a useful numerical value for correlation.
survey_fields.remove("DBN")
In [19]:
survey_corr=correlations[survey_fields]
positive_survey_corr=survey_corr.apply(lambda x: 'strong' if abs(x)>=0.25 else 'weak')
color_map_corr=positive_survey_corr.map({'weak':'#33A1C9','strong':'#ff0000'})
style.use("fivethirtyeight")
xcoords={'aca_tot_11':0.4,
 'eng_tot_11':0.4,
 'com_tot_11':0.4,
 'saf_tot_11':0.405,
 'aca_s_11':0.405,
 'eng_s_11':0.405,
 'com_s_11':0.405,
 'saf_s_11':0.405,
 'aca_t_11':0.405,
 'eng_t_11':0.405,
 'com_t_11':0.405,
 'saf_t_11':0.405,
 'aca_p_11':0.405,
 'eng_p_11':0.405,
 'com_p_11':0.405,
 'saf_p_11':0.405,
 'N_p':0.425,
 'N_t':0.425,
 'N_s':0.425,
 'rr_p':0.425,
 'rr_t':0.425,
 'rr_s':0.425}
fig,ax=plt.subplots(figsize=(9,6))
ax.grid(b=False)
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.barh(survey_corr.index,survey_corr,height=0.5,left=-0.1,color=color_map_corr)
y_coord=21.0
for y_label,x_coord in xcoords.items():
    ax.text(x_coord, y_coord, y_label,color=color_map_corr[y_label])
    y_coord -= 1
ax.axvline(x=0.34,color='grey',alpha=0.3,linewidth=1,ymin=0.03,ymax=20)
ax.axhline(-0.5, color='grey', linewidth=1, alpha=0.5,
          xmin=0.03, xmax=0.94)
ax.text(-0.24, -1.4, '-0.1'+ ' '*106 + '+0.4',
        color='grey', alpha=0.5)
ax.text(-0.24, 23.0,
        'Safety Scores seem to show a more positive impact on SAT scores',
        size=17, weight='bold')
ax.text(-0.24, 22.0,
        'Investigating the correlations between SAT scores and survey fields')
plt.show()

My observations:

  • All of the fields are positively correlated with SAT scores except for the fields comp_p_11, and rr_t. The comp_p_11 field is about the communication scored based on parent responses, and the rr_t tells us about the teacher response rate.
  • Fields having strong positive correlations with SAT score are N_s, N_p, saf_t_11, saf_s_11, aca_s_11, and saf_tot_11
  • The N_s, and N_p fields are about the number of students and parents responses respectively. This seems to be obvious as the greater N_s means that you have a lot of students and thus a greater chance of scoring high.
  • The columns saf_t_11, saf_s_11, and saf_tot_11 are about the safety and respect score based on teachers, students, and total responses respectively. Safety and respect can have a significant impact on test scores, as well as the students overall performance.
  • The column aca_s_11 tells us about the academic expectations score based on students responses. This can certainly impact SAT score because if students have a high academic expectation score this means that they are more serious about their college, and thus more serious about their SAT's.

Exploring Safety and SAT scores

In [20]:
combined.plot.scatter('saf_s_11','sat_score',title='Safety & Respect Plot Relation')
plt.show()

The safety and respect score based on student responses positive correlation with SAT score can be seen in the graph above. Besides this we can observe that:

  • More schools have a lower safety score and that their SAT scores fall between 1000 and 1400.
  • Schools having a safety score between 7 and 9 have higher SAT scores-mostly above 1600
In [21]:
combined.groupby('boro')['saf_s_11'].agg(numpy.mean)
Out[21]:
boro
Bronx            6.606577
Brooklyn         6.370755
Manhattan        6.831370
Queens           6.721875
Staten Island    6.530000
Name: saf_s_11, dtype: float64

On average all borough's have similar safety and respect scores with Brooklyn having the lowest safety score, and Manhattan having the highest safety score.

Exploring Race and SAT Scores

In [22]:
racial_cols=['white_per','asian_per','black_per','hispanic_per']
racial=correlations[racial_cols]
positive_racial_corr=racial.apply(lambda x: True if x>=0 else False)
color_map_corr_ra=positive_racial_corr.map({False:'#FFA500',True:'#ff0000'})
racial_coords={'hispanic_per':0.6,'black_per':0.6,'asian_per':0.6,'white_per':0.6}
fig,ax=plt.subplots(figsize=(8,4))
ax.grid(b=False)
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.barh(racial.index,racial,height=0.2,left=-0.1,color=color_map_corr_ra)
y_coord=3.0
for y_label,x_coord in racial_coords.items():
    ax.text(x_coord, y_coord, y_label,color=color_map_corr_ra[y_label])
    y_coord -= 1
ax.axvline(x=0.55,color='grey',alpha=0.3,linewidth=1,ymin=0.08,ymax=2.8)
ax.axhline(-0.3, color='grey', linewidth=1, alpha=0.5,
           xmin=0.03, xmax=0.95)
ax.text(-0.55, -0.60, '-0.4'+ ' '*95 + '+0.6',color='grey', alpha=0.5)
ax.text(-0.55, 4.0,
         'SAT scores have a strong relation with the % of white students',
         size=17, weight='bold')
ax.text(-0.55, 3.7,
         'Investigating the correlations between SAT scores and Race')

plt.show()

It seems that racial differences do have a relationship with SAT scores. We can observe that:

  • Asian's and White's have a strong positive correlation with SAT scores where white's have a stronger correlation from amongst these two.
  • Black's and Hispanic's have a strong negative correlation with SAT scores where the latter has a stronger correlation from amongst these two.
In [23]:
combined.plot.scatter('hispanic_per','sat_score')
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe7c5af65e0>

It seems that schools with hispanic_per less than 20 are more likely to score more than 1600 in SAT. The correlation doesnot confirm this relationship as their can be several other reasons behind the plot above. It would be better to look at the school profiles and search them on the net.

In [24]:
combined[combined['hispanic_per']>95]['SCHOOL NAME']
Out[24]:
44                         MANHATTAN BRIDGES HIGH SCHOOL
82      WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL
89     GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...
125                  ACADEMY FOR LANGUAGE AND TECHNOLOGY
141                INTERNATIONAL SCHOOL FOR LIBERAL ARTS
176     PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE
253                            MULTICULTURAL HIGH SCHOOL
286               PAN AMERICAN INTERNATIONAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

The above list contains school's having more than 95% hispanic students. After surfing the web it seems that more than 95% of the students in these schools are hispanic, and more than 60% of them on average are larning English. All schools have English test scores below state average while some of them also have below sate level scores in mathematics. This means that a reason for low SAT scores can be students finding English portion of SAT difficult.

In [25]:
combined[(combined['hispanic_per']<10)&(combined['sat_score']>1800)]['SCHOOL NAME']
Out[25]:
37                                STUYVESANT HIGH SCHOOL
151                         BRONX HIGH SCHOOL OF SCIENCE
187                       BROOKLYN TECHNICAL HIGH SCHOOL
327    QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...
356                  STATEN ISLAND TECHNICAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

All students including Hispanic are scoring more than 95% in all of their subjects. There are not any students here learning English and their test scores in English are betwwen 99%-100%. After analyzing some of the reviews made on Great Schools website it seems that teachers are strict here and focus more on pushing the students towards greatness. However, several parents do not approve of this as every student is good at something different but it seems that teachers here focus highly on studies.

Exploring Gender and SAT scores

In [26]:
gender_cols=['male_per','female_per']
gender=correlations[gender_cols]
positive_gender_corr=gender.apply(lambda x: True if x>=0 else False)
color_map_corr_ge=positive_gender_corr.map({True:'#33A1C9',False:'#FFA500'})
gender_coords={'female_per':0.02,'male_per':0.02}
fig,ax=plt.subplots(figsize=(4,2))
ax.grid(b=False)
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.barh(gender.index,gender,height=0.1,left=-0.1,color=color_map_corr_ge)
y_coord=0.97
for y_label,x_coord in gender_coords.items():
    ax.text(x_coord, y_coord, y_label,color=color_map_corr_ge[y_label])
    y_coord -= 1
ax.axvline(x=0.016,color='grey',alpha=0.3,linewidth=1,ymin=0.05,ymax=0.95)
ax.axhline(-0.07, color='grey', linewidth=1, alpha=0.5,
            xmin=0.05, xmax=0.95)
ax.text(-0.22, -0.2, '-0.1'+ ' '*40 + '+0.1',color='grey', alpha=0.5)
ax.text(-0.22, 1.35,
          'Females score more on SAT than Males',
          size=17, weight='bold')
ax.text(-0.22, 1.2,
          'Investigating the correlations between SAT scores and Gender')
plt.show()

It seems that males tend to have lesser SAT scores than females. These correlations are not strong but its worthwhile to investigate such a 180 degree difference between the two genders.

In [27]:
combined.plot(x='female_per',y='sat_score',kind='scatter')
plt.show()

The following observations can be made from the plot above:

  • The data is highly concentrated between 40-60% of females.
  • It seems that schools having more than 70% females tend to score below 1600 which is similar to schools having less than 35% females.
  • Schools having between 50-60% females tend to score above 1600.
In [28]:
combined[(combined['female_per']>60)&(combined['sat_score']>1700)]['SCHOOL NAME']
Out[28]:
5                         BARD HIGH SCHOOL EARLY COLLEGE
26                         ELEANOR ROOSEVELT HIGH SCHOOL
60                                    BEACON HIGH SCHOOL
61     FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...
302                          TOWNSEND HARRIS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

These schools perform greater than and equal to the state level average of test scores. However, these schools focus more towards arts and humanities especially BARD HIGH SCHOOL EARLY COLLEGE which means that students might find the subjects being taught easier, and less pressure on their shoulders.

Exploring AP scores VS SAT scores

In [29]:
combined['ap_per']=(combined['AP Test Takers ']/combined['total_enrollment'])*100
In [30]:
combined.plot.scatter('ap_per','sat_score')
plt.show()

There seems to be no correlation between percentage of AP Test Takers and total_enrollment

Exploring Class Size VS SAT Scores

In [31]:
combined.plot.scatter('AVERAGE CLASS SIZE','sat_score')
plt.show()

There seems to be a positive correlation between average class size and sat scores

  • Most of the schools have an average class size of 20-30 and SAT scores of around 1000-1400
  • Schools having low class sizes (10-20) have low SAT scores (1000-1300)
  • Schools having large class sizes (30-40) tend to have SAT scores (1400-2100)

Borough's with the Best Schools based on SAT

In [32]:
combined.groupby('boro')['sat_score'].agg(numpy.mean)
Out[32]:
boro
Bronx            1157.598203
Brooklyn         1181.364461
Manhattan        1278.331410
Queens           1286.753032
Staten Island    1382.500000
Name: sat_score, dtype: float64
In [33]:
combined.groupby('boro')['sat_score'].agg(numpy.max)
Out[33]:
boro
Bronx            1969.0
Brooklyn         1833.0
Manhattan        2096.0
Queens           1910.0
Staten Island    1953.0
Name: sat_score, dtype: float64
In [34]:
combined.groupby('boro')['sat_score'].agg(numpy.min)
Out[34]:
boro
Bronx             934.0
Brooklyn          887.0
Manhattan        1014.0
Queens            951.0
Staten Island    1195.0
Name: sat_score, dtype: float64

According to the data above it seems that Staten Islands have the highest average SAT scores (1382.5) where as the Bronx has the lowest average SAT score (1157.6).

Conclusion

Summarizing all the findings in one place.

  1. Students who have responded to the survey have scored higher marks in SAT.
  2. The safety and respect scores seem to correlate positively with the SAT score.
  3. Academic expectations of teachers and parents are low except for the academic expectations of students.
  4. Safety:
    • Schools that have a higher safety rating provide a better environment for students to focus more on their studies.
    • All districts have similar safety and respect score's where Brooklyn has the lowest score and Manhattan has the highest score
    • </ul> </li>
    • Racial Disparity:
      • Schools that have a higher percentage of White and Asian students tend to score higher in SAT scores than schools having a higher percentage of Blacks and Hispanics.
      • </ul> </li>
      • Gender Inequality
        • Several schools have equal percentages of males and females. However, it is quite interesting to see that schools having an equal amount of male and females (40-60%) perform better in SAT's
        • </ul> </li>
        • Class Size:
          • It was interesting to observe that schools with greater average class size's perform better than school's with lower average class size. Although, it should have been the opposite as lower class size's mean that every student can get more attention from his/her teacher.
          • </ul> </li>
          • By Neighbourhood:
            • The top schools by average SAT scores are in Staten Islands (1382.5) and schools in Bronx (1157.6) have the lowest average SAT score.
            • </ul> </li>