U.S. GDP vs. Wage Income¶

For every wage dollar paid, what is GDP output?¶

In answering this question, we derive a model for GDP growth based on observations from wage growth.

Dependencies: - Linux, bash - Python: matplotlib, pandas - Modules: yi_1tools.py, yi_fred.py, yi_plot.py, yi_timeseries.py

CHANGE LOG

2014-12-07  Update code and commentary.
2014-08-15  First version.

In [1]:

#  NOTEBOOK settings and system details:

#  Assume that the backend is LINUX (our particular distro is Ubuntu, running bash shell):
print '\n ::  TIMESTAMP of last notebook execution:'
!date
print '\n ::  IPython version:'
!ipython --version

#  Automatically reload modified modules:
%load_ext autoreload
%autoreload 2   #  0 will disable autoreload.
#  Generate plots inside notebook:
%matplotlib inline

#  DISPLAY options
from IPython.display import Image 
#  e.g. Image(filename='holt-winters-equations.png', embed=True)
from IPython.display import YouTubeVideo
#  e.g. YouTubeVideo('1j_HxD4iLn8')
from IPython.display import HTML # useful for snippets
#  e.g. HTML('<iframe src=http://en.mobile.wikipedia.org/?useformat=mobile width=700 height=350></iframe>')
import pandas as pd
print '\n ::  pandas version:'
print pd.__version__
#      pandas DataFrames are represented as text by default; enable HTML representation:
#      [Deprecated: pd.core.format.set_printoptions( notebook_repr_html=True ) ]
pd.set_option( 'display.notebook_repr_html', False )

#  MATH display, use %%latex, rather than the following:
#                from IPython.display import Math
#                from IPython.display import Latex

print '\n ::  Working directory (set as $workd):'
workd, = !pwd
print workd + '\n'

 ::  TIMESTAMP of last notebook execution:
Tue Dec  9 19:52:44 PST 2014

 ::  IPython version:
2.3.0

 ::  pandas version:
0.15.0

 ::  Working directory (set as $workd):
/home/yaya/Dropbox/ipy/fecon235/nb

In [2]:

#  Some useful modules: 
from yi_1tools import *
from yi_fred import *
from yi_plot import *
from yi_timeseries import *

Examine U.S. population statistics¶

In [3]:

#  Total US population in millions, released monthly:
pop = getfred( m4pop )/1000.0

In [4]:

plotfred( pop )

In [5]:

georet( pop, 12 )

Out[5]:

[1.14, 1.14, 0.09, 12]

This gives the annualized geometric growth rate, but one might also look at fertility rates which supports the population, e.g. 2.1 children per female will ensure growth. (cf. fertility rates in Japan which has been declining over the decades.)

In [6]:

#  Fraction of population which works:
emppop = getfred( m4emppop )/100.0

Workers would be employed adults, which presumably exclude children (20% of pop) and most elderly persons (14% of pop). There is a dramatic drop in working% from about 64% in 2001 to about 59% recently.

In [7]:

plotfred( emppop )

In [8]:

#  Total US workers in millions:
workers = todf( pop * emppop )

In [9]:

plotfred( workers )

In [10]:

georet( workers, 12  )

Out[10]:

[1.18, 1.19, 1.17, 12]

Total population, and number of workers, grow annually around 1.16% -- but the number of workers seems to be stabilizing around 185 million people.¶

Examine U.S. Gross Domestic Product¶

In [11]:

#  Deflator:
defl = getfred( m4defl )

In [12]:

#  REAL GDP in billions:
gdp = getfred( m4gdpus )
#  The release cycle is quarterly, but we resample to monthly,
#  in order to sync with the deflator.

gdpr = todf( defl * gdp )
#  We do NOT use m4gdpusr directly because that is in 2009 dollars!
#  Our deflator always uses current dollars.

In [13]:

plotfred( gdpr )

In [14]:

georet( gdpr, 12 )

Out[14]:

[2.86, 2.86, 1.08, 12]

Real GDP geometric rate of growth is 2.9% per annum (collectively due to the working population, presumably).

Real GDP per worker (NOT per capita)¶

In [15]:

#  Real GDP per worker -- NOT per capita:
gdprworker = todf( gdpr / workers )

In [16]:

plotfred( gdprworker )
#  plotted in thousands of dollars

Chart shows each worker contributes over $90,000 worth to GDP¶

In [17]:

georet( gdprworker, 12 )

Out[17]:

[1.69, 1.7, 1.37, 12]

Each worker has been more productive over the years, contributing more to GDP, at an annual pace of 1.7%.

Examine wage income¶

In [18]:

#  Annual income, not real, assuming 40 hours per week, 50 weeks per year:
inc = getfred( m4wage )*2000
#  But maybe working hours have decreased recently?

In [19]:

#  REAL income in thousands per worker:
rinc = todf((defl * inc)/1000.0)

In [20]:

plotfred( rinc )

Do wages multiply out to GDP?¶

In [21]:

#  Ratio of real GDP to real income per worker:
gdpinc = todf( gdprworker / rinc )

Implicitly we are assuming all workers earn wages at the nonfarm non-supervisory private-sector rate. Not a bad assumption for our purposes here, if changes in labor rates are uniformly applied across various categories since we are interested in the multiplier effect.

In [22]:

plotfred( gdpinc )

The ratio of real GDP to real income per worker has increased from 1.4 in the 1970's to 2.2 recently. (There is a noticeable temporary dip after the 2007 crisis.)¶

This means currently each wage dollar paid yields $2.27 worth of product or services.

As a cross-check, we know each worker has become more productive in producing national wealth.

Hypothesis: over the years, technology has exerted upward pressure on productivity, and downward pressure on wages. In other words, the slope of gdpinc is a function of technological advances. (Look for counterexamples in other countries.)

In [23]:

#  Fit and plot the simplified time trend:
plotfred( trend( gdpinc ))

 ::  regresstime slope = 0.00175176905443

Long-term: each year, on average, adds 0.021 to gdpinc multiplier.

Short-term: let's forecast that multiplier effect using Holt-Winters method, two years into the future, month-by-month...¶

In [24]:

holtfred( gdpinc, 24 )

Out[24]:

    Forecast
0   2.265296
1   2.259082
2   2.259629
3   2.260176
4   2.260724
5   2.261271
6   2.261818
7   2.262366
8   2.262913
9   2.263460
10  2.264008
11  2.264555
12  2.265102
13  2.265650
14  2.266197
15  2.266744
16  2.267292
17  2.267839
18  2.268387
19  2.268934
20  2.269481
21  2.270029
22  2.270576
23  2.271123
24  2.271671

GDP/wage = 2.27 -- but historically not stable!¶

If that multiplier is truly constant, then mathematically a x% change in wages translates to a x% change in GDP. This is why the Fed, esp. Janet Yellen, pays so much attention to wage growth. But the gdpinc chart clearly shows the multiplier in question is not stable.¶

Model for GDP growth based on observations from wage growth¶

We found evidence of a time-variant multiplier $m_t$ such that $G_t = m_t w_t$. Let's focus on GDP growth, expressed as the usual percentage change.

In [25]:

%%latex
\begin{aligned}
\frac{G_{t+1} - G_t}{G_t} = \frac{m_{t+1} w_{t+1}}{m_t w_t} - 1
\end{aligned}

\begin{aligned} \frac{G_{t+1} - G_t}{G_t} = \frac{m_{t+1} w_{t+1}}{m_t w_t} - 1 \end{aligned}

Notice that LHS is just the growth rate of $m_t w_t$. So abusing notation, we can write $\%(G) = \%(m w)$

Empirically the multiplier varies in a very linear fashion as a function of time. So let's evaluate the GDP growth numerically, using the most recent multiplier and its expected historical incrementation, assuming wage has increased by 5% year-over-year:

In [26]:

%%latex
\begin{aligned}
(\frac{2.270 + 0.021}{2.270}) {1.05} - 1 = 0.0597
\end{aligned}

\begin{aligned} (\frac{2.270 + 0.021}{2.270}) {1.05} - 1 = 0.0597 \end{aligned}

GDP has grown almost exactly 6%. So let's note that: 6/5 = 1.2

In other words, GDP_growth = 1.2 * wage_growth, as a rough current approximation, given usual multiplier behavior.¶

In [27]:

#  Latest wage annual income data, in thousands of real dollars:
tail( rinc, 13 )

Out[27]:

                    Y
T                    
2013-10-01  41.154417
2013-11-01  41.205776
2013-12-01  41.246674
2014-01-01  41.279944
2014-02-01  41.441691
2014-03-01  41.349480
2014-04-01  41.296676
2014-05-01  41.270318
2014-06-01  41.271898
2014-07-01  41.293348
2014-08-01  41.429776
2014-09-01  41.385427
2014-10-01  41.400000

As of the October 2014, real wage growth was +0.60% YoY, thus we can predict that 2014Q4 real GDP growth will be +0.73%.

The model is thus useful in forecasting GDP, which is only released quarterly, by using wage data which is released monthly. The multiplier effect can also be incorporated in a multivariable model as well.¶

APPENDIX 1: Linear regression shows weak correlation (~0.65) if gdpinc multiplier is time-invariant¶

In [28]:

stat2( gdprworker[y], rinc[y] )

 ::  FIRST variable:
count    667.000000
mean      63.761559
std       15.429600
min       36.432570
25%       53.950416
50%       62.867692
75%       75.177827
max       94.040646
Name: Y, dtype: float64

 ::  SECOND variable:
count    610.000000
mean      36.882474
std        2.224209
min       32.852482
25%       34.894259
50%       36.603215
75%       38.345661
max       41.441691
Name: Y, dtype: float64

 ::  CORRELATION
0.652122415059

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         607
Number of Degrees of Freedom:   2

R-squared:         0.4253
Adj R-squared:     0.4243

Rmse:             10.5027

F-stat (1, 605):   447.6566, p-value:     0.0000

Degrees of Freedom: model 1, resid 605

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     4.0905     0.1933      21.16     0.0000     3.7116     4.4695
     intercept   -84.5124     7.1390     -11.84     0.0000   -98.5049   -70.5199
---------------------------------End of Summary---------------------------------

APPENDIX 2: Linear regression between real GDP growth and real wage growth is useless when the multiplier is treated as if it is time-invariant¶

In [29]:

#  Examine year-over-year percentage growth:
stat2( pcent(gdpr, 12)[y], pcent(rinc, 12)[y] ) 

 ::  FIRST variable:
count    655.000000
mean       2.915700
std        2.469703
min       -3.828793
25%        1.770101
50%        3.072340
75%        4.321166
max        9.208310
Name: Y, dtype: float64

 ::  SECOND variable:
count    598.000000
mean       0.451297
std        1.379578
min       -3.708588
25%       -0.536752
50%        0.526949
75%        1.369087
max        4.591182
Name: Y, dtype: float64

 ::  CORRELATION
0.443828748187

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         595
Number of Degrees of Freedom:   2

R-squared:         0.1970
Adj R-squared:     0.1956

Rmse:              2.2085

F-stat (1, 593):   145.4659, p-value:     0.0000

Degrees of Freedom: model 1, resid 593

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.7903     0.0655      12.06     0.0000     0.6619     0.9187
     intercept     2.4286     0.0952      25.50     0.0000     2.2419     2.6152
---------------------------------End of Summary---------------------------------

Appendix 3: Linear regression of growth model with time-variant multiplier¶

The mw below represents the series $m_t w_t$ in our analytical model described above.

In [30]:

mw     = todf( gdpinc * rinc, 'mw' )
mwpc   = todf( pcent( mw, 12), 'mwpc' )
gdprpc = todf( pcent( gdpr, 12), 'Gpc' )
dataf = paste( [gdprpc, mwpc] )

In [31]:

#  The 0 in the formula means no intercept:
result = regressformula( dataf['1964':], 'Gpc ~ 0 + mwpc' )
print result.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    Gpc   R-squared:                       0.802
Model:                            OLS   Adj. R-squared:                  0.802
Method:                 Least Squares   F-statistic:                     2413.
Date:                Tue, 09 Dec 2014   Prob (F-statistic):          2.42e-211
Time:                        19:53:23   Log-Likelihood:                -1142.8
No. Observations:                 595   AIC:                             2288.
Df Residuals:                     594   BIC:                             2292.
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
mwpc           1.3924      0.028     49.120      0.000         1.337     1.448
==============================================================================
Omnibus:                       96.415   Durbin-Watson:                   0.102
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              167.474
Skew:                          -0.979   Prob(JB):                     4.30e-37
Kurtosis:                       4.709   Cond. No.                         1.00
==============================================================================

R-squared for dataf after 1964 looks respectable at around 0.80, however, the fit does terrible after the Great Recession.

The coefficent implies this fitted equation: $\%(G) = 1.39 * \%(m w)$.

In contrast, our analytic model (as opposed to the regression model in Appendix 2) suggested for the most recent data: $\%(G) = 1.20 * \%(w)$