Data source: https://nycopendata.socrata.com/data?cat=education
Data description:
NYS Math Exam Results for NYC between Grades 3-8 from 2006-2011.
Proficieny Level 1, 2- Below level for that grade
Proficiency Level 3- appropriate for that grade
Proficiency Level 4- above the alevel apprpriate for that grade
------------------------------------------------------------------------------------------------
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Manual data fixing done before reading it here:
Removed columns with percentage (This I know)
Added columns 1&2 together to create a new one (I don't know how to do in Python"
BoroMath = pd.read_csv("MathTest_Boro2.csv")
#saved locally in the same folder
------------------------------------------------------------------------------------------------
BoroMath = BoroMath.rename(columns={'Level1&2': 'BelowAverage', 'Level3':'Average', 'Level4':'AboveAverage'})
#renamed the header for ease
BoroMath.head(3)
Borough | Grade | Year | Category | Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | Level3&4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | BRONX | 3 | 2006 | Female | 7984 | 664 | 2546 | 4232 | 1206 | 5438 |
1 | BRONX | 3 | 2006 | Male | 8461 | 663 | 2773 | 4386 | 1302 | 5688 |
2 | BRONX | 3 | 2007 | Female | 7803 | 675 | 1780 | 4410 | 1613 | 6023 |
3 rows × 10 columns
BoroMath2 = BoroMath.drop(['Level3&4'], inplace=True,axis=1)
#Got rid of the Level3&4 here for practice, could've deleted in file like the others.
GroupedMath stores all entries for that borough (grades 3-8, all grades)
AllMath stores all entries for a particular borough for "All Grades"
AllStudents stores all the entries for "All Students" for all of the boroughs
Q: what form does this become? List? _
_Q: When would I care?
Q: Can I do math operations with this form?
GroupedBronxMath = BoroMath[BoroMath['Borough']=='BRONX']
#This con
AllBronxMath = GroupedBronxMath[GroupedBronxMath['Grade']=='All Grades']
#AllBronxMath is all grades for a borough
GroupedBrooklynMath = BoroMath[BoroMath['Borough']=='BROOKLYN']
AllBrooklynMath = GroupedBrooklynMath[GroupedBrooklynMath['Grade']=='All Grades']
GroupedManhattanMath = BoroMath[BoroMath['Borough']=='MANHATTAN']
AllManhattanMath = GroupedManhattanMath[GroupedManhattanMath['Grade']=='All Grades']
GroupedQueensMath = BoroMath[BoroMath['Borough']=='QUEENS']
AllQueensMath = GroupedQueensMath[GroupedQueensMath['Grade']=='All Grades']
GroupedSIMath = BoroMath[BoroMath['Borough']=='STATEN ISLAND']
AllSIMath = GroupedSIMath[GroupedSIMath['Grade']=='All Grades']
AllStudents = BoroMath[BoroMath['Grade']=='All Grades']
#There's 12 per borough
#6 per gender
#2 per year
AllStudents.head(12)
Borough | Grade | Year | Category | Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | |
---|---|---|---|---|---|---|---|---|---|
72 | BRONX | All Grades | 2006 | Female | 48244 | 644 | 26437 | 18403 | 3404 |
73 | BRONX | All Grades | 2006 | Male | 49616 | 642 | 27282 | 18682 | 3652 |
74 | BRONX | All Grades | 2007 | Female | 47011 | 654 | 21301 | 21034 | 4676 |
75 | BRONX | All Grades | 2007 | Male | 50061 | 651 | 23866 | 21195 | 5000 |
76 | BRONX | All Grades | 2008 | Female | 45661 | 662 | 15303 | 25117 | 5241 |
77 | BRONX | All Grades | 2008 | Male | 48700 | 659 | 18005 | 25219 | 5476 |
78 | BRONX | All Grades | 2009 | Female | 45423 | 671 | 10599 | 27585 | 7239 |
79 | BRONX | All Grades | 2009 | Male | 48521 | 668 | 13256 | 27851 | 7414 |
80 | BRONX | All Grades | 2010 | Female | 45466 | 670 | 26240 | 13502 | 5724 |
81 | BRONX | All Grades | 2010 | Male | 48687 | 668 | 28920 | 13910 | 5857 |
82 | BRONX | All Grades | 2011 | Female | 45598 | 672 | 24794 | 16042 | 4762 |
83 | BRONX | All Grades | 2011 | Male | 48423 | 669 | 27535 | 15843 | 5045 |
12 rows × 9 columns
AllStudents.describe()
Year | Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | |
---|---|---|---|---|---|---|
count | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 |
mean | 2008.500000 | 42922.066667 | 672.683333 | 15044.266667 | 18976.133333 | 8901.666667 |
std | 1.722237 | 20169.577332 | 11.342872 | 9136.840989 | 9767.739463 | 5341.662627 |
min | 2006.000000 | 12431.000000 | 642.000000 | 1639.000000 | 4452.000000 | 2493.000000 |
25% | 2007.000000 | 26797.000000 | 665.750000 | 6785.250000 | 9237.000000 | 4414.000000 |
50% | 2008.500000 | 48333.500000 | 673.500000 | 13360.000000 | 19191.500000 | 6629.000000 |
75% | 2010.000000 | 60109.250000 | 682.000000 | 22884.500000 | 27596.750000 | 13857.250000 |
max | 2011.000000 | 70730.000000 | 688.000000 | 32470.000000 | 36905.000000 | 19667.000000 |
8 rows × 6 columns
This doesn't tell much because of the data type
AllStudentsYr = AllStudents.groupby(['Year','Category']).sum()
AllStudentsYr
Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | ||
---|---|---|---|---|---|---|
Year | Category | |||||
2006 | Female | 216807 | 3289 | 90995 | 93221 | 32591 |
Male | 223781 | 3282 | 96436 | 93633 | 33712 | |
2007 | Female | 211962 | 3335 | 71104 | 99512 | 41346 |
Male | 222919 | 3322 | 80398 | 100880 | 41641 | |
2008 | Female | 206949 | 3370 | 49665 | 111877 | 45407 |
Male | 217858 | 3357 | 59549 | 112195 | 46114 | |
2009 | Female | 206320 | 3408 | 34282 | 116928 | 55110 |
Male | 217072 | 3394 | 42827 | 119581 | 54664 | |
2010 | Female | 207155 | 3404 | 92805 | 66925 | 47425 |
Male | 218583 | 3393 | 103001 | 68683 | 46899 | |
2011 | Female | 207503 | 3409 | 85770 | 77753 | 43980 |
Male | 218415 | 3398 | 95824 | 77380 | 45211 |
12 rows × 5 columns
Allgirls = AllStudents[AllStudents['Category']=='Female']
Allgirls.groupby(['Year']).sum()
Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | |
---|---|---|---|---|---|
Year | |||||
2006 | 216807 | 3289 | 90995 | 93221 | 32591 |
2007 | 211962 | 3335 | 71104 | 99512 | 41346 |
2008 | 206949 | 3370 | 49665 | 111877 | 45407 |
2009 | 206320 | 3408 | 34282 | 116928 | 55110 |
2010 | 207155 | 3404 | 92805 | 66925 | 47425 |
2011 | 207503 | 3409 | 85770 | 77753 | 43980 |
6 rows × 5 columns
Allboys = AllStudents[AllStudents['Category']=='Male']
Allboys.groupby(['Year']).sum()
Number Tested | Mean Scale Score | BelowAverage | Average | AboveAverage | |
---|---|---|---|---|---|
Year | |||||
2006 | 223781 | 3282 | 96436 | 93633 | 33712 |
2007 | 222919 | 3322 | 80398 | 100880 | 41641 |
2008 | 217858 | 3357 | 59549 | 112195 | 46114 |
2009 | 217072 | 3394 | 42827 | 119581 | 54664 |
2010 | 218583 | 3393 | 103001 | 68683 | 46899 |
2011 | 218415 | 3398 | 95824 | 77380 | 45211 |
6 rows × 5 columns
AllBronxMath.sum()
Borough BRONXBRONXBRONXBRONXBRONXBRONXBRONXBRONXBRONXB... Grade All GradesAll GradesAll GradesAll GradesAll Gr... Year 24102 Category FemaleMaleFemaleMaleFemaleMaleFemaleMaleFemale... Number Tested 571411 Mean Scale Score 7930 BelowAverage 263538 Average 244383 AboveAverage 63490 dtype: object
Bx_below = AllBronxMath[AllBronxMath['Year']==2011].sum().BelowAverage
Bx_avg = AllBronxMath[AllBronxMath['Year']==2011].sum().Average
Bx_above = AllBronxMath[AllBronxMath['Year']==2011].sum().AboveAverage
M_below = AllManhattanMath[AllManhattanMath['Year']==2011].sum().BelowAverage
M_avg = AllManhattanMath[AllManhattanMath['Year']==2011].sum().Average
M_above = AllManhattanMath[AllManhattanMath['Year']==2011].sum().AboveAverage
Bk_below = AllBrooklynMath[AllBrooklynMath['Year']==2011].sum().BelowAverage
Bk_avg = AllBrooklynMath[AllBrooklynMath['Year']==2011].sum().Average
Bk_above = AllBrooklynMath[AllBrooklynMath['Year']==2011].sum().AboveAverage
Q_below = AllQueensMath[AllQueensMath['Year']==2011].sum().BelowAverage
Q_avg = AllQueensMath[AllQueensMath['Year']==2011].sum().Average
Q_above = AllQueensMath[AllQueensMath['Year']==2011].sum().AboveAverage
SI_below = AllSIMath[AllSIMath['Year']==2011].sum().BelowAverage
SI_avg = AllSIMath[AllSIMath['Year']==2011].sum().Average
SI_above = AllSIMath[AllSIMath['Year']==2011].sum().AboveAverage
NYC_Below = [int(Bx_below), int(M_below), int(Bk_below), int(Q_below), int(SI_below)]
NYC_Avg = [int(Bx_avg), int(M_avg), int(Bk_avg), int(Q_avg), int(SI_avg)]
NYC_Above = [int(Bx_above), int(M_above), int(Bk_above), int(Q_above), int(SI_above)]
NYC_Bxtot= [float(Bx_below) + float(Bx_avg) + float(Bx_above)]
NYC_Mtot= [int(M_below) + int(M_avg) + int(M_above)]
NYC_Bktot= [int(Bk_below) + int(Bk_avg) + int(Bk_above)]
NYC_Qtot= [int(Q_below) + int(Q_avg) + int(Q_above)]
NYC_SItot= [int(SI_below) + int(SI_avg) + int(SI_above)]
____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________
#Scores... proficiency Level...
plt.figure(figsize=(10,5))
plt.scatter(AllStudents.Year, AllStudents.BelowAverage, lw=10, alpha=.5, color='m')
plt.scatter(AllStudents.Year, AllStudents.Average, lw=10, alpha=.5, color='c')
plt.scatter(AllStudents.Year, AllStudents.AboveAverage, lw=10, alpha=.5, color='g')
plt.xlabel("Year")
#plt.set_xticklabels(('2006', '2007', '2008', '2009', '2010', '2011') )
#Tried this from above plot, didn't work
plt.ylabel("Students at Proficiency")
plt.title("Proficiency Over Time",fontsize='15')
plt.legend(('Below Average', 'Average', 'Above Average'), bbox_to_anchor = (1.3, 1))
<matplotlib.legend.Legend at 0x10729f7d0>
Q: Why did I have to rename the column from two words to one word in order to sum it? Ex: It was Below Average, now it's BelowAverage
____________________________________________________________________________________________________________________________
N = 6
BoysLevel3 = Allboys.groupby(['Year']).sum().Average
GirlsLevel3 = Allgirls.groupby(['Year']).sum().Average
ind = np.arange(N)
width = 0.35
fig, ax = plt.subplots()
rectsl = ax.bar(ind, BoysLevel3, width, color='b')
rects2 = ax.bar(ind+width, GirlsLevel3, width, color='y')
# add some
ax.set_ylabel('Number of Students')
ax.set_title('Number of Students at Average Proficiency in Math')
ax.set_xticks(ind+width)
ax.set_xticklabels( ('2006', '2007', '2008', '2009', '2010', '2011') )
ax.legend( ('rects1'[0], rects2[0]), ('Boys', 'Girls') )
ax.legend(('Boys', 'Girls'), loc='upper right')
/Users/aribajahan/anaconda/lib/python2.7/site-packages/matplotlib/legend.py:613: UserWarning: Legend does not support r Use proxy artist instead. http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist (str(orig_handle),))
<matplotlib.legend.Legend at 0x10783f1d0>
____________________________________________________________________________________________________________________________
N = 6
BoysLevel12 = Allboys.groupby(['Year']).sum().BelowAverage
GirlsLevel12 = Allgirls.groupby(['Year']).sum().BelowAverage
ind = np.arange(N)
width = 0.35
fig, ax = plt.subplots()
rectsl = ax.bar(ind, BoysLevel12, width, color='b')
rects2 = ax.bar(ind+width, GirlsLevel12, width, color='y')
# add some
ax.set_ylabel('Number of Students')
ax.set_title('Number of Students at Below Average Proficiency in Math')
ax.set_xticks(ind+width)
ax.set_xticklabels( ('2006', '2007', '2008', '2009', '2010', '2011') )
ax.legend( ('rects1'[0], rects2[0]), ('Boys', 'Girls'))
ax.legend(('Boys', 'Girls'), bbox_to_anchor = (1.3, 1) )
<matplotlib.legend.Legend at 0x107a0db90>
____________________________________________________________________________________________________________________________
N = 6
BoysLevel4 = Allboys.groupby(['Year']).sum().AboveAverage
GirlsLevel4 = Allgirls.groupby(['Year']).sum().AboveAverage
ind = np.arange(N)
width = 0.35
fig, ax = plt.subplots()
rectsl = ax.bar(ind, BoysLevel4, width, color='b')
rects2 = ax.bar(ind+width, GirlsLevel4, width, color='y')
#Labels of Axes
ax.set_ylabel('Number of Students')
ax.set_title('Number of Students at Above Average Proficiency in Math')
ax.set_xticks(ind+width)
ax.set_xticklabels( ('2006', '2007', '2008', '2009', '2010', '2011') )
ax.legend( ('rects1'[0], rects2[0]), ('Boys', 'Girls') )
ax.legend(('Boys', 'Girls'), loc='upper right')
<matplotlib.legend.Legend at 0x107a53e50>
Q: But how do I fix the years... can I modify the tick names??
____________________________________________________________________________________________________________________________
PIE1 = [181594, 155133, 89191]
# NYC_Avg, NYC_Above]
labelsP = ('Below Average', 'Average', 'Above Average')
plt.subplot(aspect=True)
plt.pie(PIE1, labels=labelsP, colors = ('y', 'm', 'b'),autopct='%i%%')
plt.title("NYC Math Proficiency Level Grade 3-8, 2011")
<matplotlib.text.Text at 0x107a711d0>
Q: How do I do percentage?!?!
I need to do: Bx_below/NYC_Bxtot *100
____________________________________________________________________________________________________________________________
#NYC_Below = [Bx_below, M_below, Bk_below, Q_below, SI_below]
#NYC_Avg = [Bx_avg, M_avg, Bk_avg, Q_avg, SI_avg]
#NYC_Above = [Bx_above, M_above, Bk_above, Q_above, SI_above]
N = 5
ind = np.arange(N)
#width = 0.35
margin = 0.8
width = (1.-2.*margin)/N
fig, ax = plt.subplots(figsize=(10,5))
rects1 = ax.bar(ind+width+width, NYC_Below, width, color='y')
rects2 = ax.bar(ind+width, NYC_Avg, width, color='m')
rects3 = ax.bar(ind, NYC_Above, width, color='b')
ax.set_ylabel('Students')
ax.set_title(('Students Proficiency Levels by Borough 2011'), fontsize = 15)
ax.set_xticks(ind+width)
ax.set_xticklabels(('Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten sland'))
ax.legend( (rects1[0], rects2[0], rects3[0]), ('Below Average', 'Average', 'Above Average') )
ax.legend(('Below Average', 'Average', 'Above Average'), fontsize=10, loc='upper right')
<matplotlib.legend.Legend at 0x107d1ce90>
____________________________________________________________________________________________________________________________
(Stacked View)
#NYC_Below = [Bx_below, M_below, Bk_below, Q_below, SI_below]
#NYC_Avg = [Bx_avg, M_avg, Bk_avg, Q_avg, SI_avg]
#NYC_Above = [Bx_above, M_above, Bk_above, Q_above, SI_above]
N = 5
Boro_Below= (55.7, 40.13, 43.5, 34.5, 34.7)
Std1= (1,1,1,1,1)
Boro_Avg= (34.0, 34.7, 36, 38.6, 41.0)
Std2= (1,1,1,1,1)
Boro_Above= (10.3, 25.2, 20.6, 26.9, 24.0)
Std3= (1,1,1,1,1)
ind = np.arange(N)
width = 0.35
margin = 0.8
#width = (1.-2.*margin)/N
#fig, ax = plt.subplots(figsize=(10,5))
p1 = plt.bar(ind,Boro_Below, width, color='y')
p2 = plt.bar(ind,Boro_Avg, width, color='m', bottom=Boro_Below)
p3 = plt.bar(ind,Boro_Above, width, color='b', bottom=[Boro_Below[j] + Boro_Avg[j] for j in range(len(Boro_Below))])
plt.ylabel('Students %')
plt.title(('Students Proficiency Levels by Borough 2011'), fontsize = 15)
plt.xticks(ind+width/2., ('Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten sland') )
#plt.yticks(np.arange(0,81,10))
plt.legend((p1[0], p2[0], p3[0]),('Below Average', 'Average', 'Above Average'),fontsize=10, bbox_to_anchor = (1.4, 1))
plt.show()