#!/usr/bin/env python
# coding: utf-8

# ## Improved estimation of Signal Noise Ratio via moments
# 
# Let $\mu$, and $\sigma$ be the mean and standard deviation of the returns of an asset. Then $\zeta = \mu / \sigma$ is the "Signal to Noise Ratio" (SNR). Typically the SNR is estimated with the _Sharpe Ratio_, defined as $\hat{\zeta} = \hat{\mu} / \hat{\sigma}$, where $\hat{\mu}$ and $\hat{\sigma}$ are the vanilla sample estimates. Can we gain efficiency in the case where the returns have significant skew and kurtosis?
# 
# Here we consider an estimator of the form
# $$
# v = a_0 + \frac{a_1 + \left(1+a_2\right)\hat{\mu} + a_3 \hat{\mu}^2}{\hat{\sigma}} + a_4 \left(\frac{\hat{\mu}}{\hat{\sigma}}\right)^2.
# $$
# The Sharpe Ratio corresponds to $a_0 = a_1 = a_2 = a_3 = a_4 = 0$. Note that we were inspired by 
# Norman Johnson's [work on t-tests under skewed distributions](http://www.jstor.org/stable/2286597). Johnson considered a similar setup, but with only $a_1, a_2,$ and $a_3$ free, and was concerned with the problem of hypothesis testing on $\mu$. 
# 
# Below, following Johnson, I will use the Cornish Fisher expansions of $\hat{\mu}$ and $\hat{\sigma}$ to approximate $v$ as a function of the first few cumulants of the distribution, and some normal variates. I will then compute the mean square error, $E\left[\left(v - \zeta\right)^2\right],$ and take its derivative with respect to $a_i$. Unfortunately, we will find that the first order conditions are solved by $a_i=0$, which is to say that the vanilla Sharpe has the lowest MSE of estimators of this kind. Our adventure will take us far, but we will return home empty handed.
# 
# We proceed.

# In[1]:


# load what we need from sympy
from __future__ import division
from sympy import *
from sympy import Order
from sympy.assumptions.assume import global_assumptions
from sympy.stats import P, E, variance, Normal
init_printing()
nfactor = 4

# define some symbols.
a0, a1, a2, a3, a4 = symbols('a_0 a_1 a_2 a_3 a_4',real=True)
n, sigma = symbols('n \sigma',real=True,positive=True)
zeta, mu3, mu4 = symbols('\zeta \mu_3 \mu_4',real=True)
mu = zeta * sigma


# We now express $\hat{\mu}$ and $\hat{\sigma}^2$ by the [Cornish Fisher expansion](https://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion). This is an expression of the distribution of a random variable in terms of its cumulants and a normal variate. The expansion is ordered in a way such that when applied to the _mean_ of independent draws of a distribution, the terms are clustered by the order of $n$.  The Cornish Fisher expansion also involves the [Hermite polynomials](https://en.wikipedia.org/wiki/Hermite_polynomials). The expansions of $\hat{\mu}$ and $\hat{\sigma}^2$ are not independent. We follow Johnson in expression the correlation of normals and truncating: 

# In[2]:


# probabilist's hermite polynomials
def Hen(x,n):
    return (2**(-n/2) * hermite(n,x/sqrt(2)))

# this comes out of the wikipedia page:
h1 = lambda x : Hen(x,2) / 6
h2 = lambda x : Hen(x,3) / 24
h11 = lambda x : - (2 * Hen(x,3) + Hen(x,1)) / 36

# mu3 is the 3rd centered moment of x
gamma1 = (mu3 / (sigma**(3/2))) / sqrt(n)
gamma2 = (mu4 / (sigma**4)) / n

# grab two normal variates with correlation rho
# which happens to take value:
# rho = mu3 / sqrt(sigma**2 * (mu4 - sigma**4))
z1 = Normal('z_1',0,1)
z3 = Normal('z_3',0,1)
rho = symbols('\\rho',real=True)
z2 = rho * z1 + sqrt(1-rho**2)*z3

# this is out of Johnson, but we call it mu hat instead of x bar:
muhat = mu + (sigma/sqrt(n)) * (z1 + gamma1 * h1(z1) + gamma2 * h2(z1) + gamma1**2 * h11(z1))
muhat


# In[3]:


addo = sqrt((mu4 - sigma**4) / (n * sigma**4)) * z2
# this is s^2 in Johnson:
sighat2 = (sigma**2) * (1 + addo)
# use Taylor's theorem to express sighat^-1:
invs = (sigma**(-1)) * (1 - (1/(2*sigma)) * addo)
invs


# In[4]:


# the new statistic; it is v = part1 + part2 + part3
part1 = a0
part2 = (a1 + (1+a2)*muhat + a3 * muhat**2) * invs
part3 = a4 * (muhat*invs)**2

v = part1 + part2 + part3
v


# That's a bit hairy. Here I truncate that statistic in $n$. This was hard for me to figure out in sympy, so I took a limit. (I like how 'oo' is infinity in sympy.)

# In[5]:


#show nothing
v_0 = limit(v,n,oo)
v_05 = v_0 + (limit(sqrt(n) * (v - v_0),n,oo) / sqrt(n))
v_05


# Now we define the error as $v - \zeta$ and compute the approximate bias and variance of the error. We sum the variance and squared bias to get mean square error.

# In[6]:


staterr = v_05 - zeta
# mean squared error of the statistic v, is
# MSE = E((newstat - zeta)**2)
# this is too slow, though, so evaluate them separately instead:
bias = E(staterr)
simplify(bias)


# In[7]:


# variance of the error:
varerr = variance(staterr)
MSE = (bias**2) + varerr 
collect(MSE,n)


# That's really involved, and finding the derivative will be ugly. Instead we truncate at $n^{-1}$, which leaves us terms constant in $n$. Looking above, you will see that removing terms in $n^{-1}$ leaves some quantity squared. That is what we will minimize. The way forward is fairly clear from here.

# In[8]:


# truncate!
MSE_0 = limit(collect(MSE,n),n,oo)
MSE_1 = MSE_0 + (limit(n * (MSE - MSE_0),n,oo)/n)
MSE_0


# Now we take the derivative of the Mean Square Error with respect to the $a_i$. In each case we will get an equation linear in all the $a_i$. The first order condition, which corresponds to minimizing the MSE, occurs for $a_i=0$.

# In[9]:


# a_0
simplify(diff(MSE_0,a0))


# In[10]:


# a_1
simplify(diff(MSE_0,a1))


# In[11]:


# a_2
simplify(diff(MSE_0,a2))


# In[12]:


# a_3
simplify(diff(MSE_0,a3))


# In[13]:


# a_4
simplify(diff(MSE_0,a4))


# To recap, the minimal MSE occurs for $a_0 = a_1 = a_2 = a_3 = a_4 = 0$. We must try another approach.