The emh package allows you to test the effeciency of any univariate zoo time series object in R.

The package achieves this using the following methodology,

- Downsample the data into a number of subfrequencies
- Run a suite of statistical tests of randomness on each subfrequency
- Aggregate the results (p values, Z scores, etc.) in a data.frame and return it

In addition to randomness tests emh also includes a number of stochastic process models.

The first step to using the package is to download it and install it using the devtools R package

In [1]:

```
library(devtools)
suppressMessages(install_github(repo="stuartgordonreid/emh",
force = TRUE))
```

Now check that you can load the package,

In [2]:

```
suppressMessages(library(emh))
```

The emh package includes a few functions which allow you to download a bunch of global stock market indices from Quandl.com right off the bat. You can, of course, also pass in your own data. I recommend sticking with zoo objects when using emh because of how the downsampling works.

In [3]:

```
# This may take some time. Use the S3, $ operator to see the datasets.
global_indices <- emh::data_quandl_downloader(data_quandl_indices())
```

Generating a data.frame with the results from each of the randomness tests is as easy as passing a zoo object into the is_random function in emh. This function will downsample the data into multiple lower frequencies and run a battery of tests on each subfrequency.

Frequencies are specified by the freqs1 and freqs2 arguments,

- freqs1 - this specifies the
*lags*to use when computing returns - freqs2 - this specifies time-aware lags. Options are: c("Mon", "Tue", "Wed", "Thu", "Fri", "Week", "Month")

In [4]:

```
results <- is_random(S = global_indices$'YAHOO/INDEX_SML',
a = 0.99, # To get a 99% confident result
freqs1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
freqs2 = c("Mon", "Tue", "Wed", "Thu", "Fri", "Week", "Month"))
```

You can now view the results (a data.frame object) or plot some interesting statistics

In [5]:

```
head(results, 30)
```

In [6]:

```
plot_results(results)
```

In the two graphs above we can see that there are a large number of non-random results at the $t-1$ to $t$, and $t-2$ to $t$ frequencies.

This might imply that the selected market, the small cap index of the S&P 500, is non-random at those frequencies. When we look at the second graph we see that most of the results were produced by the Ljung-Box and Durbin-Watson statistical tests which implies that there might exist some serial correlations in the data which are significantly different from zero. These tests do not tell us how *economically* significant the serial correlations are nor does it tell us *where* in the data these serial correlations were observed. These questions are best left to the entrepid quant trader to answer.