Factors of Underrepresentation¶

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Click here to run this notebook on Colab

I recently encountered a 2009 paper by Ceci, Williams, and Barnett, "Women's underrepresentation in science: sociocultural and biological considerations", which lists in the abstract these "factors unique to underrepresentation [of women] in math-intensive fields":

(a) Math-proficient women disproportionately prefer careers in non–math-intensive fields and are more likely to leave math-intensive careers as they advance;

(b) more men than women score in the extreme math-proficient range on gatekeeper tests, such as the SAT Mathematics and the Graduate Record Examinations Quantitative Reasoning sections;

(c) women with high math competence are disproportionately more likely to have high verbal competence, allowing greater choice of professions; and

(d) in some math-intensive fields, women with children are penalized in promotion rates.

To people familiar with this area of research, none of these are surprising, but the third caught my attention because I recently looked at the correlation between math and verbal scores on the SAT and ACT. In general, they are highly correlated, with $r$ around 0.7, and they are equally correlated for men and women. So I was curious to know where this claim comes from and, if it is true, how big a factor it might be.

As evidence, Ceci et al. summarize results from "a tracking study of 1,100 high–mathematics aptitude students who expressed a goal of majoring in mathematics or science in college", which found that

One determinant of who switched out of math/science fields was the asymmetry between their verbal and mathematics abilities. Women's verbal abilities on average were nearly as strong as their mathematics abilities (only 61 points difference between their SAT-V and SAT-M), leading them to enter professions that prized verbal reasoning (e.g., law), whereas men's verbal abilities were an average of 115 points lower than their mathematics ability, possibly leading them to view mathematics as their only strength."

And they cite Achter, Lubinski, Benbow, & Eftekhari-Sanjani, 1999 and Wai, Lubinski, & Benbow, 2005.

I don't have access to their data, but I ran a similar analysis with data from the National Longitudinal Survey of Youth 1997 (NLSY97), which "follows the lives of a sample of [8,984] American youth born between 1980-84". The public data set includes the participants' scores on several standardized tests, including the SAT and ACT. Assuming that most participants took these exams when they were 17, they probably took them between 1997 and 2001.

I found that the pattern described by Ceci et al. also appears in this dataset. Although the correlation between math and verbal scores is the same for men and women, the slope of the regression line is not. In a group of male and female test-takers with the same math score, the verbal scores for the female test-takers are higher, on average. Near the high end of the range, the difference is about 34 points, which is a little smaller than the difference in the previous study, 54 points.

So we might ask:

Is this a big enough difference that it seems likely to affect career choices? For example, suppose Student A has scores M 750 V 660 and Student B has scores M 750 V 690. Do you think A would be substantially more likely than B to "view mathematics as their only strength"?
If we assume that the answer is yes, and that both students make career choices accordingly, how big an effect would this have on the sex ratios we see in math-intensive fields?

I don't have the data to answer the first question, but we can use the data we have, and a model of the filtering processes, to put an upper bound on the second.

To summarize the results, the largest effect I found for factor (c) is that it might increase the sex ratio in a math-intensive field by 5-15%. For example, if the minimum for a math-intensive job is 700 on the math section, the sex ratio among the people who meet this requirement is 1.8:1.

Now suppose that everyone who meets this standard takes a math-intensive job, EXCEPT the people who also get 700 or more on the verbal section. If all of those people choose a different career, the sex ratio of the ones left in the math-intensive job goes up to 2.0:1.

To see what happens as we move farther into the tail, I used the data to create a Gaussian model, and used the model to simulate test scores beyond the range of the SAT. With this model, we see that the effect of factor (b) increases as we make the requirements stricter.

For example, if the threshold score is 800 for the math and verbal sections, the sex ratio among the people who meet the math requirement is 4.6:1. If the people who meed the verbal requirement choose different careers, the sex ratio among the people left behind is 4.9:1 (an increase of about 7%). So it seems like the effect of factor (c) gets smaller as we go farther into the tails of the distributions.

Finally, I use the model to decompose two parts of factor (b), the difference in means and the difference in variance. When the threshold is 800, the contribution of these two parts is about equal; that is:

If we set the means to be the same, but preserve the difference in variance, the sex ratio among people who meet the math requirement is about 2.2:1.
If we set the variances to be the same, but preserve the difference in means, the sex ratio among people who meet the math requirement is about 2.2:1.

Data¶

The following cell downloads the data file.

In [1]:

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

download('https://github.com/AllenDowney/ProbablyOverthinkingIt2/raw/master/' +
         'nlsy/stand_test_corr.csv.gz')

And I'll read the data into a Pandas DataFrame.

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(17)

In [3]:

nlsy = pd.read_csv('stand_test_corr.csv.gz')
nlsy.shape

Out[3]:

(8984, 29)

In [4]:

nlsy.head()

Out[4]:

	R0000100	R0536300	R0536401	R0536402	R1235800	R1482600	R5473600	R5473700	R7237300	R7237400	...	R9794001	R9829600	S1552600	S1552700	Z9033700	Z9033800	Z9033900	Z9034000	Z9034100	Z9034200
0	1	2	9	1981	1	4	-4	-4	-4	-4	...	1998	45070	-4	-4	4	3	3	2	-4	-4
1	2	1	7	1982	1	2	-4	-4	-4	-4	...	1998	58483	-4	-4	4	5	4	5	-4	-4
2	3	2	9	1983	1	2	-4	-4	-4	-4	...	-4	27978	-4	-4	2	4	2	4	-4	-4
3	4	2	2	1981	1	2	-4	-4	-4	-4	...	-4	37012	-4	-4	-4	-4	-4	-4	-4	-4
4	5	1	10	1982	1	2	-4	-4	-4	-4	...	-4	-4	-4	-4	2	3	6	3	-4	-4

5 rows × 29 columns

The columns are documented in the codebook. The following dictionary maps the current column names to more memorable names.

In [5]:

d = {'R9793200': 'psat_math',
     'R9793300': 'psat_verbal',
     'R9793400': 'act_comp',
     'R9793500': 'act_eng',
     'R9793600': 'act_math',
     'R9793700': 'act_read',
     'R9793800': 'sat_verbal',
     'R9793900': 'sat_math',
     'R0536300': 'sex',
    }

nlsy.rename(columns=d, inplace=True)

There are 4599 male and 4385 female participants.

In [6]:

nlsy['sex'].value_counts()

Out[6]:

1    4599
2    4385
Name: sex, dtype: int64

SAT¶

The SAT data includes a few cases where the scores are less than 200, so let's clean those up.

In [7]:

varnames = ['sat_verbal', 'sat_math']

for varname in varnames:
    invalid = (nlsy[varname] < 200)
    nlsy.loc[invalid, varname] = np.nan

In [8]:

male = nlsy[nlsy['sex'] == 1]
female = nlsy[nlsy['sex'] == 2]

1400 of the participants took the SAT. Their average and standard deviation are close to the national average (500) and standard deviation (100).

In [9]:

nlsy['sat_verbal'].describe()

Out[9]:

count    1400.000000
mean      501.678571
std       108.343678
min       200.000000
25%       430.000000
50%       500.000000
75%       570.000000
max       800.000000
Name: sat_verbal, dtype: float64

In [10]:

nlsy['sat_math'].describe()

Out[10]:

count    1399.000000
mean      503.213009
std       109.901382
min       200.000000
25%       430.000000
50%       500.000000
75%       580.000000
max       800.000000
Name: sat_math, dtype: float64

On the verbal section, the male and female averages are roughly the same. The male scores are a little more variable.

In [11]:

male['sat_verbal'].describe()

Out[11]:

count    649.000000
mean     502.681048
std      111.764295
min      200.000000
25%      430.000000
50%      500.000000
75%      580.000000
max      800.000000
Name: sat_verbal, dtype: float64

In [12]:

female['sat_verbal'].describe()

Out[12]:

count    751.000000
mean     500.812250
std      105.365425
min      200.000000
25%      430.000000
50%      490.000000
75%      570.000000
max      800.000000
Name: sat_verbal, dtype: float64

On the math section the male average is substantially higher (518 compared to 491) and the male scores are substantially more variable (std 115 compared to 104).

In [13]:

male['sat_math'].describe()

Out[13]:

count    648.000000
mean     517.608025
std      114.682496
min      220.000000
25%      440.000000
50%      520.000000
75%      600.000000
max      800.000000
Name: sat_math, dtype: float64

In [14]:

female['sat_math'].describe()

Out[14]:

count    751.000000
mean     490.792277
std      104.089408
min      200.000000
25%      420.000000
50%      480.000000
75%      550.000000
max      790.000000
Name: sat_math, dtype: float64

The correlation between the sections is high, about 0.73 overall.

In [15]:

nlsy[varnames].corr()

Out[15]:

	sat_verbal	sat_math
sat_verbal	1.000000	0.734739
sat_math	0.734739	1.000000

And the correlation is about the same for both groups.

In [16]:

male[varnames].corr()

Out[16]:

	sat_verbal	sat_math
sat_verbal	1.000000	0.736466
sat_math	0.736466	1.000000

In [17]:

female[varnames].corr()

Out[17]:

	sat_verbal	sat_math
sat_verbal	1.000000	0.742472
sat_math	0.742472	1.000000

Regression¶

Although the correlations are the same, the regression lines are not.

In [18]:

def decorate(**options):
    """Decorate the current axes.
    
    Call decorate with keyword arguments like
    decorate(title='Title',
             xlabel='x',
             ylabel='y')
             
    The keyword arguments can be any of the axis properties
    https://matplotlib.org/api/axes_api.html
    """
    ax = plt.gca()
    ax.set(**options)
    
    handles, labels = ax.get_legend_handles_labels()
    if handles:
        ax.legend(handles, labels)

    plt.tight_layout()

In [19]:

import numpy as np
from statsmodels.nonparametric.smoothers_lowess import lowess

def make_lowess(x, y):
    """Use LOWESS to compute a smooth line.
    
    series: pd.Series
    
    returns: pd.Series
    """
    #y = series.values
    #x = series.index.values

    smooth = lowess(y, x)
    index, data = np.transpose(smooth)

    return pd.Series(data, index=index) 

In [20]:

def plot_lowess(df, color='C0', **options):
    x, y = df['sat_math'], df['sat_verbal']
    plt.plot(x, y, '.', 
             ms=2, alpha=0.2, color=color)
    smooth = make_lowess(x, y)
    smooth.plot(color=color, **options)

    decorate(xlabel='SAT Math',
             ylabel='SAT Math - Verbal',
             title='Difference vs math score')

The following figure shows a scatterplot of verbal and math scores for male and female participants, along with a local regression line.

In [21]:

plot_lowess(female, 'C1', label='female')
plot_lowess(male, label='male')

For a given math score, the female participants have a higher verbal score, and this gap seems to be wider at the high end of the range.

For male participants who got a 750 on the math section, the average score on the verbal section is 659.

In [22]:

smooth = make_lowess(male['sat_math'], male['sat_verbal'])
smooth[750]

Out[22]:

658.8914941639335

In [23]:

750 - smooth[750]

Out[23]:

91.10850583606646

For female participants who got a 750 on the math section, the average score one the verbal section is 693.

In [24]:

smooth = make_lowess(female['sat_math'], female['sat_verbal'])
smooth[750]

Out[24]:

750.0    693.277281
750.0    693.277281
750.0    693.277281
dtype: float64

In [25]:

750 - smooth[750]

Out[25]:

750.0    56.722719
750.0    56.722719
750.0    56.722719
dtype: float64

This is consistent with the result Ceci et al reported from a previous study:

Women's verbal abilities on average were nearly as strong as their mathematics abilities (only 61 points difference between their SAT-V and SAT-M), leading them to enter professions that prized verbal reasoning (e.g., law), whereas men's verbal abilities were an average of 115 points lower than their mathematics ability, possibly leading them to view mathematics as their only strength.

In the NLSY dataset, these differences are a little smaller, 57 points for women and 91 for men. So we might ask

Is this a big enough difference that it seems likely to affect career choices? For example, suppose Student A has scores M 750 V 660 and Student B has scores M 750 V 690. Do you think A would be substantially more likely than B to "view mathematics as their only strength"?
If we assume that the answer is yes, and that both students make career choices accordingly, how big an effect would this have on the sex ratios we see in math-intensive fields?

I don't have the data to answer the first question, but we can use the data we have, and a model of the filtering processes, to put an upper bound on the second.

Simulating the filtering process¶

As a simple model of the world, let's suppose that there are two jobs:

A math-intensive job that requires a math SAT score of 700 or more.
A math-and-verbal-intensive job that requires a math SAT score of 700 or more AND a verbal SAT score of 700 or more.

And let's suppose that these jobs are so appealing that

If someone is qualified for the math-and-verbal job, they will choose to do it;
Otherwise, if they are qualified for the math job, they will choose to do it;
Otherwise they will do something else.

The following function simulates this filtering process, computing:

The number of people who meet the math requirement, and their fraction of the population,
The fraction of people who meet the verbal requirement, given that they meet the math requirement.
The number of people who meet the math requirement and NOT the verbal requirement, and their fraction of the population,

In [26]:

def simulate_filter(df, thresh_math, thresh_verbal):
    subset = df.dropna(subset=['sat_math', 'sat_verbal'])
    
    high_math = subset['sat_math'] >= thresh_math
    high_verbal = subset['sat_verbal'] >= thresh_verbal
    
    n = len(subset)
    n_math = high_math.sum()
    n_math_and_verbal = (high_math & high_verbal).sum()
    n_math_no_verbal = (high_math & ~high_verbal).sum()
    
    result = dict(n=n, n_math=n_math, n_math_no_verbal=n_math_no_verbal,
                  pct_math=n_math/n*100,
                  pct_math_no_verbal=n_math_no_verbal/n*100,
                  pct_verbal_given_math = n_math_and_verbal/n_math*100,
                  )
    
    return result

Among male participants, 6.2% meet the math requirement; 30% of them also meet the verbal requirement, which means that 4.3% of male participants meet the math requirement and NOT the verbal requirement.

In [27]:

percents_male = simulate_filter(male, 700, 700)
percents_male

Out[27]:

{'n': 648,
 'n_math': 40,
 'n_math_no_verbal': 28,
 'pct_math': 6.172839506172839,
 'pct_math_no_verbal': 4.320987654320987,
 'pct_verbal_given_math': 30.0}

Among female participants, 3.5% meet the math requirement; 38% of them also meet the verbal requirement, which means that 2.1% of female participants meet the math requirement and NOT the verbal requirement.

In [28]:

percents_female = simulate_filter(female, 700, 700)
percents_female

Out[28]:

{'n': 750,
 'n_math': 26,
 'n_math_no_verbal': 16,
 'pct_math': 3.4666666666666663,
 'pct_math_no_verbal': 2.1333333333333333,
 'pct_verbal_given_math': 38.46153846153847}

Predicting sex ratios¶

We can use the percentages in the previous section to compute the sex ratios we would see in the math-intensive job, which requires a high math score only.

In the following function, ratio1 is the ratio of men to women who meet the math requirement; ratio2 is the ratio of men to women who meet the math requirement and NOT the verbal requirement.

In [29]:

def compute_ratios(pct1, pct2):
    result = {}
    result['sex_ratio1'] = pct1['pct_math'] / pct2['pct_math']
    result['verbal_ratio'] = pct2['pct_verbal_given_math'] / pct1['pct_verbal_given_math']
    result['sex_ratio2'] = pct1['pct_math_no_verbal'] / pct2['pct_math_no_verbal']
    result['factor_c'] = result['sex_ratio2'] / result['sex_ratio1']
    return result

In [30]:

from pprint import pprint

pprint(compute_ratios(percents_male, percents_female))

{'factor_c': 1.1374999999999997,
 'sex_ratio1': 1.7806267806267808,
 'sex_ratio2': 2.025462962962963,
 'verbal_ratio': 1.2820512820512822}

These results are consistent with factors (b) and (c) as listed by Ceci et al:

(b) more men than women score in the extreme math-proficient range on gatekeeper tests, such as the SAT Mathematics and the Graduate Record Examinations Quantitative Reasoning sections;

(c) women with high math competence are disproportionately more likely to have high verbal competence, allowing greater choice of professions; and

In this dataset, men are 1.8 times more likely to have an SAT score of 700 or more. But women who meet the math requirement are 1.3 times more likely to ALSO meet the verbal requirement.

So in a job that has a math requirement but no verbal requirement, we expect to find a sex ratio near 2:1, that is, 2 men for every 1 woman.

Under these conditions, the effect of factor (c) is to increase the sex ratio in the math-intensive job from 1.8 to 2.0, an increase of 14%.

For this analysis, I set both threshold scores to 700 so that the number of participants that meet the requirements is big enough to make reasonable estimates of these ratios. But if we make the thresholds any higher, we get into very small sample sizes. So, before we go on, I will extrapolate the dataset using a Gaussian model.

Extending the model into the tails¶

The following function takes a dataset and computes the summary statistics we'll use as parameters of the model:

The mean and standard deviation of both test scores.
A regression model of verbal scores as a function of math scores, including the slope and intercept of the best fit line and the standard deviation of the residuals.

In [31]:

from scipy.stats import linregress

def run_regress(df):
    subset = df.dropna(subset=['sat_math', 'sat_verbal'])

    x = subset['sat_math']
    y = subset['sat_verbal']
    model = linregress(x, y)._asdict()
    
    model['x_bar'] = x.mean()
    model['y_bar'] = y.mean()
    model['std_x'] = x.std()
    model['std_y'] = y.std()
    model['std_resid'] = np.sqrt(y.std()**2 * (1-model['rvalue']**2))
    model['diff'] = 750 - (model['slope'] * 750 + model['intercept'])
    
    return model

Here are the results for male and female participants.

In [32]:

model_male = run_regress(male)
model_male

Out[32]:

{'slope': 0.7178981765301716,
 'intercept': 131.23421699076414,
 'rvalue': 0.7364655846849872,
 'pvalue': 9.366475010436844e-112,
 'stderr': 0.025944535141853017,
 'intercept_stderr': 13.754270755516483,
 'x_bar': 517.608024691358,
 'y_bar': 502.8240740740741,
 'std_x': 114.68249590043382,
 'std_y': 111.79117721035935,
 'std_resid': 75.62393800388467,
 'diff': 80.34215061160717}

In [33]:

model_female = run_regress(female)
model_female

Out[33]:

{'slope': 0.7529532783615287,
 'intercept': 131.58465185439144,
 'rvalue': 0.7424724177662607,
 'pvalue': 2.7404858746767917e-132,
 'stderr': 0.024838864388664287,
 'intercept_stderr': 12.454336396031112,
 'x_bar': 490.53333333333336,
 'y_bar': 500.93333333333334,
 'std_x': 103.91653903494971,
 'std_y': 105.38344168763642,
 'std_resid': 70.59390551775333,
 'diff': 53.70038937446202}

We have seen most of these statistics before, but the regression parameters are new. Note that the slope of the regression line is 0.75 for women and 0.72 for men, which means that each additional point on the math section corresponds to a bigger fraction of a point on the verbal section. And that's consistent with what we saw using local regression.

Resampling¶

Now we can use these models to simulate larger datasets, which will make it possible to explore farther into the tails. The following function takes a model and uses Gaussian distributions to generate a sample with the same parameters.

In [34]:

from scipy.stats import norm

def resample_normal(model, n):
    mu_x = model['x_bar']
    sigma_x = model['std_x']
    xs = norm(mu_x, sigma_x).rvs(n)
    over700 = norm(mu_x, sigma_x).sf(700)
    print(over700)
    
    mu_y = xs * model['slope'] + model['intercept']
    sigma_y = model['std_resid']
    ys = norm(mu_y, sigma_y).rvs(n)
    
    return pd.DataFrame(dict(sat_math=xs, sat_verbal=ys))

Here's a sample based on the male model.

In [35]:

sample_male = resample_normal(model_male, 1000000)
sample_male.head()

0.05587141804055689

Out[35]:

	sat_math	sat_verbal
0	549.290886	608.282316
1	304.914648	329.386366
2	589.158561	633.972882
3	648.955182	619.365723
4	636.555616	588.038710

And we can confirm that the summary statistics are about right.

In [36]:

run_regress(sample_male)

Out[36]:

{'slope': 0.7176117527952254,
 'intercept': 131.42994588189276,
 'rvalue': 0.7354400554847134,
 'pvalue': 0.0,
 'stderr': 0.0006611645370343535,
 'intercept_stderr': 0.3505110789379023,
 'x_bar': 517.6181002913617,
 'y_bar': 502.8787781105116,
 'std_x': 114.5514635830689,
 'std_y': 111.77454362738786,
 'std_resid': 75.73728964910423,
 'diff': 80.36123952168816}

Compared to the model it is based on.

In [37]:

model_male

Out[37]:

{'slope': 0.7178981765301716,
 'intercept': 131.23421699076414,
 'rvalue': 0.7364655846849872,
 'pvalue': 9.366475010436844e-112,
 'stderr': 0.025944535141853017,
 'intercept_stderr': 13.754270755516483,
 'x_bar': 517.608024691358,
 'y_bar': 502.8240740740741,
 'std_x': 114.68249590043382,
 'std_y': 111.79117721035935,
 'std_resid': 75.62393800388467,
 'diff': 80.34215061160717}

Here's a sample based on the female model.

In [38]:

sample_female = resample_normal(model_female, 1000000)
sample_female.head()

0.021914621151376847

Out[38]:

	sat_math	sat_verbal
0	381.189235	361.843294
1	518.926570	604.141411
2	500.096698	514.488709
3	447.117311	387.088183
4	604.412694	532.195418

And we can confirm that the summary statistics are about right.

In [39]:

run_regress(sample_female)

Out[39]:

{'slope': 0.7523444964587589,
 'intercept': 131.92380626866048,
 'rvalue': 0.7424020891829097,
 'pvalue': 0.0,
 'stderr': 0.0006789274327209064,
 'intercept_stderr': 0.3403908546824145,
 'x_bar': 490.46814940959257,
 'y_bar': 500.92481916527976,
 'std_x': 103.96375681543968,
 'std_y': 105.35606164222709,
 'std_resid': 70.58377592684528,
 'diff': 53.81782138727044}

Compared to the model it's based on.

In [40]:

model_female

Out[40]:

{'slope': 0.7529532783615287,
 'intercept': 131.58465185439144,
 'rvalue': 0.7424724177662607,
 'pvalue': 2.7404858746767917e-132,
 'stderr': 0.024838864388664287,
 'intercept_stderr': 12.454336396031112,
 'x_bar': 490.53333333333336,
 'y_bar': 500.93333333333334,
 'std_x': 103.91653903494971,
 'std_y': 105.38344168763642,
 'std_resid': 70.59390551775333,
 'diff': 53.70038937446202}

Computing percents and ratios¶

Now let's check whether the samples we generated yield similar results when we compute sex ratios. Here's what we get when we use the male sample to simulate the filtering process.

In [41]:

thresh = 700
percents_male = simulate_filter(sample_male, thresh, thresh)
percents_male

Out[41]:

{'n': 1000000,
 'n_math': 55456,
 'n_math_no_verbal': 36272,
 'pct_math': 5.545599999999999,
 'pct_math_no_verbal': 3.6271999999999998,
 'pct_verbal_given_math': 34.593190998268895}

And let's compare it to the results with the actual data.

In [42]:

simulate_filter(male, thresh, thresh)

Out[42]:

{'n': 648,
 'n_math': 40,
 'n_math_no_verbal': 28,
 'pct_math': 6.172839506172839,
 'pct_math_no_verbal': 4.320987654320987,
 'pct_verbal_given_math': 30.0}

It turns out that there are discrepancies bigger than we would expect due to random sampling. For example, in the original dataset, 5.6% of male participants meet the math requirement; in the resampled data, it's 6.2%.

We'll see what's going on in the next section, but first let's run the same analysis with the sample based on the female model:

In [43]:

percents_female = simulate_filter(sample_female, thresh, thresh)
percents_female

Out[43]:

{'n': 1000000,
 'n_math': 21834,
 'n_math_no_verbal': 12312,
 'pct_math': 2.1834,
 'pct_math_no_verbal': 1.2312,
 'pct_verbal_given_math': 43.61088211046991}

And compare it with the results from the female data.

In [44]:

simulate_filter(female, thresh, thresh)

Out[44]:

{'n': 750,
 'n_math': 26,
 'n_math_no_verbal': 16,
 'pct_math': 3.4666666666666663,
 'pct_math_no_verbal': 2.1333333333333333,
 'pct_verbal_given_math': 38.46153846153847}

Again, there are non-negligible differences. And these differences also affect the predicted sex ratios.

In [45]:

compute_ratios(percents_male, percents_female)

Out[45]:

{'sex_ratio1': 2.5398919116973526,
 'verbal_ratio': 1.2606782101325165,
 'sex_ratio2': 2.946068875893437,
 'factor_c': 1.1599189958932723}

In the Gaussian model, male participants are 2.5 times more likely to meet the math requirement (compared to 1.8 in the actual data) and the sex ratio we expect in the math-intensive job is about 2.9 (compared to 2.0 in the actual data).

What's going on?¶

To see what's going on, let's compare the distribution of math scores in the original data and in the Gaussian model. Here are the distributions for the male participants.

In [46]:

import seaborn as sns

sns.kdeplot(male['sat_math'], cut=0, label='data')
sns.kdeplot(sample_male['sat_math'], label='model')

decorate()

Based on the mean and standard deviation of SAT scores, we expect the tails to extend below 200 and above 800. But SAT scores are truncated at these bounds, and the scores are somewhat less accurate above 700 and below 300, compared to scores closer to the mean. So people who might score 810 on an exam with wider range end up spread out in the 700s, more or less at random, based on the results from a small number of questions. As a result, the Gaussian model departs from the data in the tails.

We see the same effect for female participants.

In [47]:

sns.kdeplot(female['sat_math'], cut=0, label='data')
sns.kdeplot(sample_female['sat_math'], label='model')

decorate()

The truncation of SAT scores at the high end has a substantial effect on the predicted sex ratios in a math-intensive job. In particular, it seems to mitigate the effect of the filtering processes, compared to a test that extends farther into the tails.

Nevertheless, we can use the Gaussian model to see what happens if we increase the threshold, assuming that it is based on a test that extends farther into the tails as, for example, the GRE might.

Higher threshold¶

If we increase the thresholds to 750, fewer people satisfy the requirements.

In [48]:

thresh = 750
percents_male = simulate_filter(sample_male, thresh, thresh)
percents_male

Out[48]:

{'n': 1000000,
 'n_math': 21185,
 'n_math_no_verbal': 15609,
 'pct_math': 2.1185,
 'pct_math_no_verbal': 1.5609,
 'pct_verbal_given_math': 26.320509794666037}

In [49]:

percents_female = simulate_filter(sample_female, thresh, thresh)
percents_female

Out[49]:

{'n': 1000000,
 'n_math': 6311,
 'n_math_no_verbal': 4136,
 'pct_math': 0.6311,
 'pct_math_no_verbal': 0.41359999999999997,
 'pct_verbal_given_math': 34.46363492315005}

And the sex ratio in math-intensive jobs gets higher.

In [50]:

compute_ratios(percents_male, percents_female)

Out[50]:

{'sex_ratio1': 3.3568372682617653,
 'verbal_ratio': 1.3093832601272128,
 'sex_ratio2': 3.773936170212766,
 'factor_c': 1.1242535364745228}

With this threshold, male participants are 3.4 times more likely to meet the math requirement, but female participants who meet the math requirement are 1.3 times more likely to ALSO meet the verbal requirement.

If all people who meet both requirements choose a different job, the sex ratio we expect to see in the math intensive job is about 3.8 : 1. The effect of factor (c) is to increase the sex ratio by about 12%.

If we increase the threshold to 800, fewer than 1% of people meet either requirement.

In [51]:

thresh = 800
percents_male = simulate_filter(sample_male, thresh, thresh)
percents_male

Out[51]:

{'n': 1000000,
 'n_math': 6828,
 'n_math_no_verbal': 5426,
 'pct_math': 0.6828,
 'pct_math_no_verbal': 0.5426,
 'pct_verbal_given_math': 20.53309900410076}

In [52]:

percents_female = simulate_filter(sample_female, thresh, thresh)
percents_female

Out[52]:

{'n': 1000000,
 'n_math': 1480,
 'n_math_no_verbal': 1100,
 'pct_math': 0.148,
 'pct_math_no_verbal': 0.11,
 'pct_verbal_given_math': 25.675675675675674}

And the sex ratios are higher.

In [53]:

compute_ratios(percents_male, percents_female)

Out[53]:

{'sex_ratio1': 4.613513513513514,
 'verbal_ratio': 1.2504530207811235,
 'sex_ratio2': 4.932727272727273,
 'factor_c': 1.0691910315811897}

With these thresholds, male participants are 4.6 times more likely to meet the math requirement, but female participants who meet the math requirement are 1.25 times more likely to ALSO meet the verbal requirement.

If all people who meet both requirements choose a different job, the sex ratio we expect to see in the math-intensive job is about 4.9:1. The effect of factor (c) is to increase the sex ratio by about 7%.

In summary, the more strict the math and verbal requirements are, the larger the effect of factor (b). At every level, factor (c) has the effect of increasing the sex ratio we expect in a math intensive job, but the increase is only 7-14%, and might get smaller as the requirements get stricter.

And this is probably an upper bound on the effect of factor (c), since it assumes that everyone who qualifies for the verbal-intensive job chooses to do it instead of the math-intensive job.

What if?¶

Having made this model, we can use it to answer a question related to factor (b):

(b) more men than women score in the extreme math-proficient range on gatekeeper tests, such as the SAT Mathematics and the Graduate Record Examinations Quantitative Reasoning sections;

There are more men in the tail of the distribution because their average is higher, but also because their variance is higher. So we might wonder what part of the sex ratio in the tail is explained by the difference in the means and what part by the difference in variance.

Continuing the previous example with threshold 800, the sex ratio among people who exceed this threshold is 4.6:1.

In [54]:

compute_ratios(percents_male, percents_female)

Out[54]:

{'sex_ratio1': 4.613513513513514,
 'verbal_ratio': 1.2504530207811235,
 'sex_ratio2': 4.932727272727273,
 'factor_c': 1.0691910315811897}

Now let's see what would happen if the male mean were the same as the female mean.

In [55]:

model_male1 = model_male.copy()
model_male1['x_bar'] = 502
model_male1

Out[55]:

{'slope': 0.7178981765301716,
 'intercept': 131.23421699076414,
 'rvalue': 0.7364655846849872,
 'pvalue': 9.366475010436844e-112,
 'stderr': 0.025944535141853017,
 'intercept_stderr': 13.754270755516483,
 'x_bar': 502,
 'y_bar': 502.8240740740741,
 'std_x': 114.68249590043382,
 'std_y': 111.79117721035935,
 'std_resid': 75.62393800388467,
 'diff': 80.34215061160717}

In [56]:

model_female1 = model_female.copy()
model_female1['x_bar'] = 502
model_female1

Out[56]:

{'slope': 0.7529532783615287,
 'intercept': 131.58465185439144,
 'rvalue': 0.7424724177662607,
 'pvalue': 2.7404858746767917e-132,
 'stderr': 0.024838864388664287,
 'intercept_stderr': 12.454336396031112,
 'x_bar': 502,
 'y_bar': 500.93333333333334,
 'std_x': 103.91653903494971,
 'std_y': 105.38344168763642,
 'std_resid': 70.59390551775333,
 'diff': 53.70038937446202}

The following function takes the two counter-factual models, generates samples from each, and computes the sex ratio in the tail of the distribution.

In [57]:

def run_analysis(model_male, model_female, thresh=800):
    sample_male = resample_normal(model_male, 1000000)
    percents_male = simulate_filter(sample_male, thresh, thresh)

    sample_female = resample_normal(model_female, 1000000)
    percents_female = simulate_filter(sample_female, thresh, thresh)

    ratios = compute_ratios(percents_male, percents_female)
    pprint(ratios)

If the male and female means are the same, but the male variance is higher, the sex ratio in the tail is about 2.2.

In [58]:

run_analysis(model_male1, model_female1)

0.042128223945598495
0.02836565600695388
{'factor_c': 1.084553701087678,
 'sex_ratio1': 2.218451242829828,
 'sex_ratio2': 2.4060295060936494,
 'verbal_ratio': 1.3285781038520206}

Now let's see what happens if the distributions have different means and the same variance.

In [59]:

model_male2 = model_male.copy()
model_male2['std_x'] = 109

model_female2 = model_female.copy()
model_female2['std_x'] = 109

run_analysis(model_male2, model_female2)

0.04713207262057749
0.02732096935190762
{'factor_c': 1.115579785162076,
 'sex_ratio1': 2.1703637976929904,
 'sex_ratio2': 2.4212139791538934,
 'verbal_ratio': 1.4338670688894304}

If the distributions have the same variance and different means, the sex ratio in the tail is about 2.2. So the contributions of the mean and variance are roughly equal.

And, just to confirm, if the mean and variance are the same, the ratio is close to 1.

In [61]:

model_male3 = model_male.copy()
model_male3['x_bar'] = 502
model_male3['std_x'] = 109

model_female3 = model_female.copy()
model_female3['x_bar'] = 502
model_female3['std_x'] = 109

run_analysis(model_male3, model_female3)

0.034645799449790106
0.034645799449790106
{'factor_c': 1.1460984634238311,
 'sex_ratio1': 1.0064516129032257,
 'sex_ratio2': 1.1534926470588236,
 'verbal_ratio': 1.5245267054468534}

In [ ]: