US, real GDP vs. SP500: Holt-Winters time series forecasting¶

We examine the US gross domestic product's relationship to the US equity market, in real terms. Forecasts for both are demonstrated using Holt-Winters technique. We derive the most likely range for real GDP growth, and identify excessive equity valuations aside from inflationary pressures.

Dependencies:

- Linux, bash [not crucial, cross-platform prefered]
- Python: matplotlib, pandas [recommend Anaconda distribution]
- Modules: yi_1tools, yi_timeseries, yi_fred

CHANGE LOG

2015-02-20  Code review and revision.
2014-08-11  First version uses major revision yi_fred.

In [1]:

#  NOTEBOOK settings and system details:   [00-tpl v14.09.28]

#  Assume that the backend is LINUX (our particular distro is Ubuntu, running bash shell):
print '\n ::  TIMESTAMP of last notebook execution:'
!date
print '\n ::  IPython version:'
!ipython --version

#  Automatically reload modified modules:
%load_ext autoreload
%autoreload 2   
#           0 will disable autoreload.
#  Generate plots inside notebook:
%matplotlib inline

#  DISPLAY options
from IPython.display import Image 
#  e.g. Image(filename='holt-winters-equations.png', embed=True)
from IPython.display import YouTubeVideo
#  e.g. YouTubeVideo('1j_HxD4iLn8')
from IPython.display import HTML # useful for snippets
#  e.g. HTML('<iframe src=http://en.mobile.wikipedia.org/?useformat=mobile width=700 height=350></iframe>')
import pandas as pd
print '\n ::  pandas version:'
print pd.__version__
#      pandas DataFrames are represented as text by default; enable HTML representation:
#      [Deprecated: pd.core.format.set_printoptions( notebook_repr_html=True ) ]
pd.set_option( 'display.notebook_repr_html', False )

#  MATH display, use %%latex, rather than the following:
#                from IPython.display import Math
#                from IPython.display import Latex

print '\n ::  Working directory (set as $workd):'
workd, = !pwd
print workd + '\n'

 ::  TIMESTAMP of last notebook execution:
Sun Feb 22 10:04:45 PST 2015

 ::  IPython version:
2.3.0

 ::  pandas version:
0.15.0

 ::  Working directory (set as $workd):
/home/yaya/Dropbox/ipy/fecon235/nb

In [2]:

from yi_1tools import *
from yi_fred import *
from yi_timeseries import *

We adopt the Holt-Winters notation from Rob Hyndman's Forecasting with Exponential Smoothing (2008), ignoring the seasonal aspect of the H-W model.

In [3]:

#  Note that y represents the raw series in the equations.
#  We generally use Y as a generic label in dataframes.
Image(filename='holt-winters-equations.png', embed=True)

Out[3]:

In [4]:

#  Retrieve quarterly data for real US GDP:
dfy = getfred( q4gdpusr )

In [5]:

#  Implicitly use the default values for alpha and beta:
holtdf = holt( dfy )
#  else, be more explicit, for example:
#  holtdf = holt( dfy, alpha=0.2, beta=0.3 )

In [6]:

#  DEFINE start YEAR OF ANALYSIS:
start = '1980'

In [7]:

stats( holtdf[start:] )
#  Summary stats from custom start point:

                  Y         Level      Growth
count    140.000000    140.000000  140.000000
mean   11188.082143  11184.299755   67.854840
std     3096.838445   3095.120909   40.298109
min     6382.900000   6554.525807  -48.603874
25%     8582.750000   8565.829705   46.882610
50%    11054.100000  10971.031680   76.136541
75%    14373.800000  14505.979277   93.978754
max    16311.600000  16237.819627  137.119871

 ::  Index on min:
Y        1980-07-01
Level    1982-10-01
Growth   2009-10-01
dtype: datetime64[ns]

 ::  Index on max:
Y        2014-10-01
Level    2014-10-01
Growth   2000-04-01
dtype: datetime64[ns]

 ::  Head:
                 Y        Level     Growth
T                                         
1980-01-01  6524.9  6621.385683  61.405607
1980-04-01  6392.6  6607.341555  47.070157
1980-07-01  6382.9  6583.818667  33.657479
1980-10-01  6501.2  6587.244348  27.913437
1981-01-01  6635.7  6620.498761  28.928223
1981-04-01  6587.3  6633.273968  25.859150
1981-07-01  6662.9  6660.112507  26.045234

 ::  Tail:
                  Y         Level      Growth
T                                            
2013-04-01  15606.6  15581.894623   82.349744
2013-07-01  15779.9  15694.314831   88.063132
2013-10-01  15916.2  15817.171693   94.673941
2014-01-01  15831.7  15891.007769   90.714747
2014-04-01  16010.4  15989.178662   92.131414
2014-07-01  16205.6  16113.625456   98.271336
2014-10-01  16311.6  16237.819627  103.196675

 ::  Correlation matrix:
             Y     Level    Growth
Y       1.0000  0.999200  0.169000
Level   0.9992  1.000000  0.164448
Growth  0.1690  0.164448  1.000000

In [8]:

#  Y is the original series:
plotdf( holtdf['Y'][start:] )

In [9]:

#  Annualized geometric mean return of Y is given by:
georet( todf(holtdf['Y'][start:]), 4 )
#  Since 1990, real GDP growth is about 2.4%.
#  Since 1980, real GDP growth is about 2.7%

Out[9]:

[2.65, 2.66, 1.5, 4]

In [10]:

#  Level can be thought of as the smoothed series:
plotdf( holtdf['Level'][start:] )

In [11]:

#  Growth is the fitted slope at each point:
plotdf( holtdf['Growth'][start:] )
#  It is expressed in units of the original series.

In [12]:

#  We are really interested in the forecasted
#  annualized growth rate in percentage terms:
pc = holtpc( dfy, 4 )
plotdf( pc[start:] )

In [13]:

#  Post-World-War 2, projected BIG TREND for
#  real GDP has decreased from 3.7% per annum to recent 2.2%.
plotdf( trend( pc ) )

 ::  regresstime slope = -0.0055426303149

In [14]:

#  Examine the variation around the BIG TREND
#  to see how "well" the economy is doing locally:
plotdf( detrend( pc ))

 ::  regresstime slope = -0.0055426303149

Standard deviation = 1.53 for the variation around the BIG TREND. So at 2std, we are roughly looking at +/- 3 percentage points around the long-term trend.

2015-02-20: We can see from detrend(pc) that the US is back from the Great Recession to just mean GDP growth. "Doing just OK now." Looking forward two standard deviations away from that mean (of +2.2) gives us a estimated range of -0.8 "horrible" to +5.2 "great" for real GDP growth.

Real GDP forecast using Holt-Winters¶

By fitting Level and Growth, H-W essentially uses the slope to make point forecasts forward in time.

In [15]:

#  Forecast 4 quarters ahead:
holtforecast( holtdf, 4 )

Out[15]:

       Forecast
0  16311.600000
1  16341.016301
2  16444.212976
3  16547.409651
4  16650.606326

There is a function holtfred* which does all the work above, simply given a fredcode.*

In [16]:

#  SHORT-CUT forecast using the fredcode directly:
holtfred( q4gdpusr, 4 )

Out[16]:

       Forecast
0  16311.600000
1  16341.016301
2  16444.212976
3  16547.409651
4  16650.606326

In [17]:

#  2015-02-20
100 * ((16650.61 / 16311.60) - 1)

Out[17]:

2.078336889085075

We thus have computed the H-W forecast for real GDP rate for the year ahead. (Note that this should approximately concur with our holtpc method above -- Forecast_0 is the last actual data point, rather than the last Level.)

In [18]:

#  We can plot the point forecasts 12 quarters ahead (i.e. 3 years):
plotholt( holtdf, 12 )

Real GDP vs. real S&P 500 (SPX)¶

In [19]:

#  SPX is a daily series, which will need to deflated...
spx = getfred( m4spx )
#  with data on a monthly periodicity:
defl = getfred( m4defl )

 ::  S&P 500 prepend successfully goes back to 1957.

In [20]:

#  Now we synthesize a quarterly version by resampling:
spdefl = todf( spx * defl )
spq = quarterly( spdefl )

In [21]:

#  Real SPX resampled quarterly:
plotfred( spq )

In [22]:

#  Geometric mean return for SPX:
georet( spq[start:], 4 )
#  cf. volatility for real GDP = 1.5% 
#      in contrast to equities = 13.5%

Out[22]:

[5.45, 6.36, 13.46, 4]

Both in real terms, the geometric mean return of SPX is double that of GDP. Next, we examine their correlation.

In [23]:

stat2( dfy['Y'][start:], spq['Y'][start:] )

 ::  FIRST variable:
count      140.000000
mean     11188.082143
std       3096.838445
min       6382.900000
25%       8582.750000
50%      11054.100000
75%      14373.800000
max      16311.600000
Name: Y, dtype: float64

 ::  SECOND variable:
count     140.000000
mean     1012.610692
std       521.424696
min       250.684806
25%       540.917122
50%      1015.851762
75%      1450.641147
max      2038.038664
Name: Y, dtype: float64

 ::  CORRELATION
0.888774115533

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         140
Number of Degrees of Freedom:   2

R-squared:         0.7899
Adj R-squared:     0.7884

Rmse:           1424.5554

F-stat (1, 138):   518.8908, p-value:     0.0000

Degrees of Freedom: model 1, resid 138

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     5.2786     0.2317      22.78     0.0000     4.8244     5.7328
     intercept  5842.9203   263.7359      22.15     0.0000  5325.9979  6359.8427
---------------------------------End of Summary---------------------------------

The linear regression shows approximately that $dG / dS = 4.1$

Therefore using mean values g and s: $ (dG/g) / (s/dS) = 4.1 * (s/g) $

RHS is the roughly the ratio of percentage change.

In [24]:

#  2015-02-20, for start=1990:  4.1 * (s/g) = 
4.1 * ( 1254.92 / 12683.95 )

Out[24]:

0.4056442985032265

Thus 1% rise in real SPX would suggest additional 40 bp in real US GDP [noting that $R^2 = 0.55$, so it's not a great fit]. SPX is generally thought to be a leading economic indicator. (The fit improves if start is moved farther back.)

Real SPX forecast using Holt-Winters¶

In [25]:

#  We look at projected (not actual) annual rates of return, 
#  rather than generating point forecasts:
pcspq = holtpc( spq, 4 )
plotdf( pcspq )
#  Note we use all data since 1957.

In [26]:

#  Real SPX returns are clearly very volatile,
#  so look at the long-term trend:
plotdf( trend( pcspq ))

 ::  regresstime slope = 0.00853067138105

There are occassional episodes of more than -20% forecasted on smoothed SPX, but over the long-term US equities are showing increasingly better returns against inflation.

In [27]:

#  To identify local BUBBLES, i.e. excessive equity valuations 
#  apart from inflationary effects, we detrend as follows:
plotdf( detrend( pcspq ))

 ::  regresstime slope = 0.00853067138105

In the plot above, we have filtered out both inflation and the underlying real rate of return on equities, leaving us with a picture of excessive valuations in the short-term.

The shift from overvaluation to collapse is very swift, and occurs historically on a regular but unpredictable basis. Instability is the norm.¶

In [27]: