#!/usr/bin/env python
# coding: utf-8

# # Problem 5
# Let $ f(z) = 1 / \Gamma(z)$ where $ \Gamma(z) $ is the gamma function, and let $p(z)$ be the cubic polynomial that best approximates $f(z)$ on the unit disk in the supremum norm $\Vert{\cdot}\Vert_{\infty}$. What is $\Vert{f-p}\Vert_{\infty}$?

# ### Some mathematical preliminaries
# 
# Recall that the supremum norm of a function $f$ defined on domain $D$ is defined by $\Vert{f}\Vert_\infty := \max_{x \in D} |f(x)| $. 
# 
# If we let our optimal polynomial $p(x) := ax^3 + bx^2 + cx + d $, we have that $$ \epsilon := \Vert f-p \Vert_\infty = \max_{x \in D} |f(x)-p(x)| $$ 
# 
# where $D$ is the unit disk.
# 
# Now, note that because $1/\Gamma(x)$ and $p(x)$ are entire functions, we have that, by the [maximum principle](https://en.wikipedia.org/wiki/Maximum_modulus_principle) that the maximum lies on the boundary of $D$, i.e. the unit circle.
# 
# We therefore have 
# $$\epsilon = \min_{a,b,c,d \in \mathbb{R}} \max_{\theta \in [0,2 \pi)} \left|\frac{1}{\Gamma(e^{i\theta})}- \left(a e^{3i\theta} + b e^{2i\theta}  + c e^{i \theta} + d \right) \right| $$
# 
# ### Brute force
# Firstly, we attempt out-of-the-box minimization routines. For an initial guess, we use the coefficients of the Maclaurin series of $f(z)$, namely:
# 
# $$ f(z) \approx f_0(z) = f(0) + f'(0) z + \frac{f''(0)}{2!} z^2  + \frac{f'''(0)}{3!} z^3$$
# 

# In[1]:


import sympy
from sympy import gamma as sp_gamma
from sympy.abc import z

sympy.series(1/sp_gamma(z), z, n=4)


# We have that:
# 
# $$(a_0, b_0, c_0, d_0) = \left( \frac{\gamma^2}{2} - \frac{\pi^2}{12} , \gamma, 1, 0\right) \approx (-0.655878, 0.577216, 1, 0) $$
# 
# We first use [differential evolution](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html) to find a rough global minimum of the error:

# In[13]:


from scipy.special import gamma
from scipy.optimize import differential_evolution, minimize
from numpy import cos, sin, pi
from tqdm import tqdm_notebook as tqdm
import numpy as np

cis = lambda t: cos(t) + 1j*sin(t)

a0, b0, c0, d0 = (-0.655878, 0.577216, 1, 0)

g = lambda a,b,c,d: lambda t: abs( a*cis(3*t) + b*cis(2*t) + c*cis(t) + d - 1 / gamma(cis(t)))

def fn_to_minimize(args, N=1000):
    # print(args)
    a,b,c,d = args
    X = np.linspace(0, 2*pi, N)
    Y = g(a,b,c,d)(X)
    return Y.max()

rough_min = differential_evolution(fn_to_minimize, 
                                   bounds = [(-1.1,1.1) for i in range(4)], 
                                   tol=1e-6,
                                   x0=(a0,b0,c0,d0),
                                   seed = 42)

rough_min


# To refine this estimate, we then use [local minimization techniques](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) to get a better answer:

# In[15]:


from tqdm import tqdm_notebook as tqdm

def maximize_function(F, a, b):
    F = np.vectorize(F)
    i = 0
    while b-a > 1e-16:
        X = np.linspace(a, b, 1000)
        Y = F(X)
        X_max = X[Y.argmax()]
        a, b = X_max - (b-a)/3, X_max + (b-a)/3
        if (a == X_max - (b-a)/3): break
        i += 1
    # print(i)
    return (F(a) + F(b)) / 2

min_F_seen = 10000
iterations = tqdm()

def fn_to_minimize2(args):
    global min_F_seen, iterations
    a,b,c,d = args
    F = g(a,b,c,d)
    out = maximize_function(F, 0, 2*pi)
    if out < min_F_seen:
        print("minimum F seen: ", args, out)
        min_F_seen = out
    iterations.update()
    return out

minimize(fn_to_minimize2, x0 = rough_min.x, tol=1e-12) # 0.2143352345


# With these techniques, we are able to get 6 accurate digits. Note that this is especially difficult because we are trying to find the minimum of a maximum of a function - therefore, any numerical error will get compounded rapidly

# ### Better maximization
# 
# We next follow the solution submitted by [Kim McInturff and Peter Simon](https://web.archive.org/web/20040131081114/http://www.vcnet.com/~simonp/solutions.pdf). Namely, this approach gets rid of the numerical instability of calculating the inner maximum above.
# 
# Let $f(z) = \sum_{k=0}^N c_k z^k$ and $p(z) = \sum_{k=0}^3 a_k z^k $. We therefore have that the error $ E(z) = f(z) - p(z) = \sum_{k=0}^N E_k z^k $ is a polynomial in $z$.
# 
# We find the maximum of $|E(z)|^2$ analytically (and therefore $|E(z)|$ since it is a non-negative quantity). We have: 
# 
# $$ \begin{align*}
# |E(z)|^2 &= E(z)\overline{E(z)} \\
# &= \left(\sum_{k=0}^N E_k z^k \right) \overline{\left( \sum_{k=0}^N E_k z^k \right)} \\
# &= \left(\sum_{k=0}^N E_k z^k \right) \left( \sum_{k=0}^N E_k \left(\overline{z}\right)^k \right) \\
# &= \sum_{0\le p,q \le N} E_p E_q z^p \left(\overline{z}\right)^q \\
# &= \sum_{0\le p,q \le N} E_p E_q e^{(p-q)i\theta} & (z \mapsto e^{i\theta}) 
# \end{align*}
# $$
# 
# Therefore, a maximum must occur when $\frac{d}{d\theta} |E(z)|^2 = 0$. Taking the derivative, we have:
# 
# $$ \begin{align*}
# \frac{d}{d\theta} |E(z)|^2 &= \frac{d}{d\theta} \left( \sum_{0\le p,q \le N} E_p E_q e^{(p-q)i\theta} \right) \\
# &= \sum_{0\le p,q \le N} E_p E_q \left(\frac{d}{d\theta} e^{(p-q)i\theta}  \right) \\
# &= \sum_{0\le p,q \le N} E_p E_q \left(i (p-q) e^{(p-q)i\theta}\right) \\
# &= \sum_{0\le p,q \le N} E_p E_q \left(i (p-q) z^{p-q}\right) & (e^{i\theta} \mapsto z)\\
# \end{align*}
# $$ 
# 
# We therefore have:
# 
# $$ \frac{d}{d\theta} |E(z)|^2 = 0 \iff \sum_{0\le p,q \le N} E_p E_q (p-q) z^{p-q} = 0$$
# 
# Note that this series has some negative coefficients of $z^i$, so we multiply through by $z^N$. We can solve for the roots of this polynomial $P(z) \equiv \sum_{0\le p,q \le N} E_p E_q (p-q) z^{N+p-q} $ for the critical points of $|E(z)|^2$ and then locate the maximum. This avoids the floating point issues of finding the maximum we were dealing with earlier.
# 
# We find the coefficients $c_k$ for $f(z)$ via [the following recurrence](https://en.wikipedia.org/wiki/Reciprocal_gamma_function#Taylor_series):
# 
# $$\begin{cases}
# c_0, c_1, c_2 = 0, 1, \gamma \\
# c_n = \frac{c_2 c_{n-1} + \sum_{j=2}^{n-1} (-1)^{j+1} \zeta(j) c_{n-j}}{n-1}
# \end{cases}
# $$

# In[134]:


import mpmath
import numpy as np
from numpy.polynomial import Polynomial
from collections import defaultdict

N = 30

mpmath.mp.dps = 50

gamma = mpmath.mp.euler
coeffs = [0, 1, gamma]
for n in range(3, N+1):
    next_coeff = (coeffs[2]*coeffs[n-1] + sum([pow(-1, j+1)*mpmath.zeta(j)*coeffs[n-j] for j in range(2,n)])) / (n-1)
    coeffs.append(next_coeff)
f_coeffs = [np.longdouble(str(i)) for i in coeffs]

a0, b0, c0, d0 = f_coeffs[:4][::-1]

min_F_seen = 10000
DIGITS = 0

def fn_to_minimize3(args):
    global min_F_seen, DIGITS
    p = list(args[::-1]) + [0 for i in range(N-3)]
    # Calculate coefficients of error
    E_coeffs = [f_i-p_i for f_i,p_i in zip(f_coeffs,p)]
    # Calculate p
    P_arr = np.zeros(2*N+1, dtype=float)
    for p in range(N+1):
        for q in range(N+1):
            P_arr[N+p-q] += E_coeffs[p]*E_coeffs[q]*(p-q)
    # Get roots of P
    p = Polynomial(P_arr)
    roots = p.roots()
    # Get max from critical points
    E = lambda z: sum([coeff*pow(z,n) for n,coeff in enumerate(E_coeffs)])
    potential_maxs = [ abs(E(r)) for r in roots if 0.95<=abs(r)<=1.05]
    out = max(potential_maxs)
    if out < min_F_seen:
        if abs(min_F_seen - out) <= pow(10, -(DIGITS+1)):
            print(f"new minimum: E({args}) = {min_F_seen}")
            DIGITS += 1
        min_F_seen = out
    return out

differential_evolution(fn_to_minimize3, 
                       bounds=[(-1.1,1.1) for i in range(4)], 
                       x0=(a0,b0,c0,d0), 
                       tol=1e-12,
                      seed=42)


# With this approach, we are able to get more than 10 digits of the answer.
# 
# ### Addendum
# 
# Note that this problem is a general case of finding the best approximation to a function in the supremum norm - in such cases, usually the [Remez algorithm](https://en.wikipedia.org/wiki/Remez_algorithm) is used. However, I was unable to find a Python implementation of this algorithm and my own implementation of the algorithm failed to work.