Model evaluation via Out-of-Sample (OOS)

On the last page, we discussed cross-validation testing as a way to develop a model that performs well in the real world:

  1. Create a holdout sample.
  2. "Train" many models on a portion of the training sample, and use each to predict $y$ on the remaining/unused part of the training sample.
  3. Repeat step 2 $k$ times, to see how consistently accurate each model is.
  4. Pick your preferred model.
  5. Apply the preferred model to the holdout sample, but you only get to try this one time.
{warning}
This doesn't work well in all settings. The problem is, and this sounds like a Yogi Berra quote, but: A lot of predictions are about the future.

Suppose you have stock returns for a firm over a year. You want to develop a model to predict future stock returns that you can use to become super duper rich.

If you follow the cross validation procedure from the prior page, you'll randomly sample days during the year for your training sample, which will likely include some days from each month of the year. So you'll train a model, and then test it on the rest of the sample. But your model probably knows generally how the stock did each month, so its "accuracy" in the validation sample will be high in a way that can't be replicated in practice.

So what do we do?

```{admonition} Rolling OOS testing

When your problem is a prediction one (focused on $ \hat{y} $, not $ \beta $ ) and the ML model is focused on using X at time $t$ to predict y at some future date $t+\Delta$, use a rolling out-of-sample test.

The general idea usually follows this procedure below. Suppose you have 20 years of data.

  1. The holdout sample will be the last 3 years. (15%)
  2. Train and evaluate many models. For each model:
    1. Train a model on year 1, and predict outcomes in year 2.
    2. Then train a model on years 1 and 2, and predict outcomes in year 3.
    3. Then train a model on years 1 - 3, and predict outcomes in year 4.
    4. Then train a model on years 1 - 4, and predict outcomes in year 5.
    5. ...
    6. Then train a model on years 1 - 16, and predict outcomes in year 17.
  3. Pick your preferred model.

```