Counting households using Tax-Calculator¶

Using the h_seq field to aggregate by household.

Data: CPS | Tax year: 2017 | Author: Max Ghenis | Date run: 2018-04-08

Setup¶

Imports¶

In [1]:

import taxcalc as tc
import pandas as pd
import numpy as np

In [2]:

tc.__version__

Out[2]:

'0.19.0'

Settings¶

In [3]:

pd.set_option('precision', 1)

Data¶

In [4]:

recs = tc.Records.cps_constructor()
calc = tc.Calculator(records=recs, policy=tc.Policy(), verbose=False)
calc.advance_to_year(2017)
calc.calc_all()

In [5]:

tu = calc.dataframe(['s006', 'h_seq', 'ffpos',  
                     # 'FLPDYR',  # 2014 for all records.
                     'XTOT', 'expanded_income'])
tu['filers'] = 1
tu['XTOT_m'] = tu.XTOT * tu.s006 / 1e6
tu['expanded_income_b'] = tu.expanded_income * tu.s006 / 1e9

Create household-level dataset¶

In [6]:

hh = tu.drop(columns='ffpos').groupby(['h_seq']).sum()

Add maximum s006 per household to test Ernie Tedeschi's idea.

In [7]:

hh_max_s006 = tu[['h_seq', 's006']].groupby('h_seq').max()
hh_max_s006.columns = ['max_s006']

In [8]:

hh = pd.merge(hh, hh_max_s006, left_index=True, right_index=True)

Check aggregates¶

Verify expanded income total¶

In [9]:

(round(tu.expanded_income_b.sum()), round(hh.expanded_income_b.sum()))

Out[9]:

(13196.0, 13196.0)

In [10]:

(round(tu.XTOT_m.sum()), round(hh.XTOT_m.sum()))

Out[10]:

(330.0, 330.0)

Count households¶

Four approaches.

Normalize each household's weight by number of filers¶

Is this right? For each household, assign weight as:

$weight = \frac{\sum_{filers}weight}{n_{filers}}$

In [11]:

hh['hh_s006'] = hh.s006 / hh.filers

Match to Census household count¶

https://fred.stlouisfed.org/series/TTLHH estimates 126,224,000 total households in 2017.

Multiply this by a household's total share of s006.

In [12]:

hh['s006_share'] = hh.s006 / hh.s006.sum()
TOTAL_HHS = 126224e3
hh['hh_s006_census'] = hh.s006_share * TOTAL_HHS

Calculate weights based on variable totals¶

In [13]:

hh['XTOT_s006'] = 1e6 * hh.XTOT_m / hh.XTOT
hh['expanded_income_s006'] = 1e9 * hh.expanded_income_b / hh.expanded_income

Add maximum `s006` per household¶

Ernie Tedeschi's idea.

No additional code needed.

Compare¶

In millions.

In [14]:

hh[['hh_s006', 'hh_s006_census', 'XTOT_s006', 
    'expanded_income_s006', 'max_s006']].sum() / 1e6

Out[14]:

hh_s006                  52.9
hh_s006_census          126.2
XTOT_s006                51.5
expanded_income_s006     52.3
max_s006                 75.2
dtype: float64

Recover expanded income total¶

Neither weighting approach recovers the true expanded income total.

In [15]:

round((hh.expanded_income * hh.hh_s006).sum() / 1e9)

Out[15]:

15991.0

In [16]:

round((hh.expanded_income * hh.hh_s006_census).sum() / 1e9)

Out[16]:

109675.0

Replicate by household-family¶

2017 estimate: 82,827,000 family households (https://fred.stlouisfed.org/series/TTLFHH). This isn't comparable since family households exclude single-person households.

In [17]:

BYVARS = ['h_seq', 'ffpos']

In [18]:

fhh = tu.groupby(BYVARS).sum()

In [19]:

fhh_max_s006 = tu[BYVARS + ['s006']].groupby(BYVARS).max()
fhh_max_s006.columns = ['max_s006']
fhh = pd.merge(fhh, fhh_max_s006, left_index=True, right_index=True)

In [20]:

fhh['hh_s006'] = fhh.s006 / fhh.filers
fhh['XTOT_s006'] = 1e6 * fhh.XTOT_m / fhh.XTOT
fhh['expanded_income_s006'] = (
    1e9 * fhh.expanded_income_b / fhh.expanded_income)

In [21]:

fhh[['hh_s006', 'XTOT_s006', 'expanded_income_s006', 'max_s006']].sum() / 1e6

Out[21]:

hh_s006                 68.9
XTOT_s006               67.4
expanded_income_s006    67.9
max_s006                90.6
dtype: float64