Notebook

Problem 5¶

Let $ f(z) = 1 / \Gamma(z)$ where $ \Gamma(z) $ is the gamma function, and let $p(z)$ be the cubic polynomial that best approximates $f(z)$ on the unit disk in the supremum norm $\Vert{\cdot}\Vert_{\infty}$. What is $\Vert{f-p}\Vert_{\infty}$?

Some mathematical preliminaries¶

Recall that the supremum norm of a function $f$ defined on domain $D$ is defined by $\Vert{f}\Vert_\infty := \max_{x \in D} |f(x)| $.

If we let our optimal polynomial $p(x) := ax^3 + bx^2 + cx + d $, we have that $$ \epsilon := \Vert f-p \Vert_\infty = \max_{x \in D} |f(x)-p(x)| $$

where $D$ is the unit disk.

Now, note that because $1/\Gamma(x)$ and $p(x)$ are entire functions, we have that, by the maximum principle that the maximum lies on the boundary of $D$, i.e. the unit circle.

We therefore have $$\epsilon = \min_{a,b,c,d \in \mathbb{R}} \max_{\theta \in [0,2 \pi)} \left|\frac{1}{\Gamma(e^{i\theta})}- \left(a e^{3i\theta} + b e^{2i\theta} + c e^{i \theta} + d \right) \right| $$

Brute force¶

Firstly, we attempt out-of-the-box minimization routines. For an initial guess, we use the coefficients of the Maclaurin series of $f(z)$, namely:

$$ f(z) \approx f_0(z) = f(0) + f'(0) z + \frac{f''(0)}{2!} z^2 + \frac{f'''(0)}{3!} z^3$$

In [1]:

import sympy
from sympy import gamma as sp_gamma
from sympy.abc import z

sympy.series(1/sp_gamma(z), z, n=4)

Out[1]:

$\displaystyle z + \gamma z^{2} + z^{3} \left(- \frac{\pi^{2}}{12} + \frac{\gamma^{2}}{2}\right) + O\left(z^{4}\right)$

We have that:

$$(a_0, b_0, c_0, d_0) = \left( \frac{\gamma^2}{2} - \frac{\pi^2}{12} , \gamma, 1, 0\right) \approx (-0.655878, 0.577216, 1, 0) $$

We first use differential evolution to find a rough global minimum of the error:

In [13]:

from scipy.special import gamma
from scipy.optimize import differential_evolution, minimize
from numpy import cos, sin, pi
from tqdm import tqdm_notebook as tqdm
import numpy as np

cis = lambda t: cos(t) + 1j*sin(t)

a0, b0, c0, d0 = (-0.655878, 0.577216, 1, 0)

g = lambda a,b,c,d: lambda t: abs( a*cis(3*t) + b*cis(2*t) + c*cis(t) + d - 1 / gamma(cis(t)))

def fn_to_minimize(args, N=1000):
    # print(args)
    a,b,c,d = args
    X = np.linspace(0, 2*pi, N)
    Y = g(a,b,c,d)(X)
    return Y.max()

rough_min = differential_evolution(fn_to_minimize, 
                                   bounds = [(-1.1,1.1) for i in range(4)], 
                                   tol=1e-6,
                                   x0=(a0,b0,c0,d0),
                                   seed = 42)

rough_min

Out[13]:

     fun: 0.2143352172675715
     jac: array([0.86039537, 0.64650538, 0.09199849, 0.63088437])
 message: 'Optimization terminated successfully.'
    nfev: 7550
     nit: 121
 success: True
       x: array([-0.60337732,  0.62514893,  1.01969876,  0.0055078 ])

To refine this estimate, we then use local minimization techniques to get a better answer:

In [15]:

from tqdm import tqdm_notebook as tqdm

def maximize_function(F, a, b):
    F = np.vectorize(F)
    i = 0
    while b-a > 1e-16:
        X = np.linspace(a, b, 1000)
        Y = F(X)
        X_max = X[Y.argmax()]
        a, b = X_max - (b-a)/3, X_max + (b-a)/3
        if (a == X_max - (b-a)/3): break
        i += 1
    # print(i)
    return (F(a) + F(b)) / 2

min_F_seen = 10000
iterations = tqdm()

def fn_to_minimize2(args):
    global min_F_seen, iterations
    a,b,c,d = args
    F = g(a,b,c,d)
    out = maximize_function(F, 0, 2*pi)
    if out < min_F_seen:
        print("minimum F seen: ", args, out)
        min_F_seen = out
    iterations.update()
    return out

minimize(fn_to_minimize2, x0 = rough_min.x, tol=1e-12) # 0.2143352345

/tmp/ipykernel_11995/992212994.py:17: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  iterations = tqdm()

0it [00:00, ?it/s]

minimum F seen:  [-0.60337732  0.62514893  1.01969876  0.0055078 ] 0.2143355714038389
minimum F seen:  [-0.60337732  0.62514893  1.01969878  0.0055078 ] 0.21433556169152873
minimum F seen:  [-0.60337732  0.62514893  1.01969876  0.00550781] 0.2143355586581085
minimum F seen:  [-0.60337752  0.62514878  1.01969892  0.00550799] 0.21433538060117682
minimum F seen:  [-0.60337752  0.62514878  1.01969892  0.00550799] 0.21433538060117652
minimum F seen:  [-0.60337752  0.62514878  1.01969892  0.00550799] 0.2143353806011763

Out[15]:

      fun: 0.21433538060117682
 hess_inv: array([[ 1.06212648,  0.14313859,  0.29009927,  0.16611956],
       [ 0.14313859,  1.17835123,  0.14218146,  0.02721326],
       [ 0.29009927,  0.14218146,  0.52621387, -0.45964763],
       [ 0.16611956,  0.02721326, -0.45964763,  0.60953961]])
      jac: array([0.24456594, 0.4963477 , 0.90874575, 0.4963477 ])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 272
      nit: 1
     njev: 52
   status: 2
  success: False
        x: array([-0.60337752,  0.62514878,  1.01969892,  0.00550799])

With these techniques, we are able to get 6 accurate digits. Note that this is especially difficult because we are trying to find the minimum of a maximum of a function - therefore, any numerical error will get compounded rapidly

Better maximization¶

We next follow the solution submitted by Kim McInturff and Peter Simon. Namely, this approach gets rid of the numerical instability of calculating the inner maximum above.

Let $f(z) = \sum_{k=0}^N c_k z^k$ and $p(z) = \sum_{k=0}^3 a_k z^k $. We therefore have that the error $ E(z) = f(z) - p(z) = \sum_{k=0}^N E_k z^k $ is a polynomial in $z$.

We find the maximum of $|E(z)|^2$ analytically (and therefore $|E(z)|$ since it is a non-negative quantity). We have:

$$ \begin{align*} |E(z)|^2 &= E(z)\overline{E(z)} \\ &= \left(\sum_{k=0}^N E_k z^k \right) \overline{\left( \sum_{k=0}^N E_k z^k \right)} \\ &= \left(\sum_{k=0}^N E_k z^k \right) \left( \sum_{k=0}^N E_k \left(\overline{z}\right)^k \right) \\ &= \sum_{0\le p,q \le N} E_p E_q z^p \left(\overline{z}\right)^q \\ &= \sum_{0\le p,q \le N} E_p E_q e^{(p-q)i\theta} & (z \mapsto e^{i\theta}) \end{align*} $$

Therefore, a maximum must occur when $\frac{d}{d\theta} |E(z)|^2 = 0$. Taking the derivative, we have:

$$ \begin{align*} \frac{d}{d\theta} |E(z)|^2 &= \frac{d}{d\theta} \left( \sum_{0\le p,q \le N} E_p E_q e^{(p-q)i\theta} \right) \\ &= \sum_{0\le p,q \le N} E_p E_q \left(\frac{d}{d\theta} e^{(p-q)i\theta} \right) \\ &= \sum_{0\le p,q \le N} E_p E_q \left(i (p-q) e^{(p-q)i\theta}\right) \\ &= \sum_{0\le p,q \le N} E_p E_q \left(i (p-q) z^{p-q}\right) & (e^{i\theta} \mapsto z)\\ \end{align*} $$

We therefore have:

$$ \frac{d}{d\theta} |E(z)|^2 = 0 \iff \sum_{0\le p,q \le N} E_p E_q (p-q) z^{p-q} = 0$$

Note that this series has some negative coefficients of $z^i$, so we multiply through by $z^N$. We can solve for the roots of this polynomial $P(z) \equiv \sum_{0\le p,q \le N} E_p E_q (p-q) z^{N+p-q} $ for the critical points of $|E(z)|^2$ and then locate the maximum. This avoids the floating point issues of finding the maximum we were dealing with earlier.

We find the coefficients $c_k$ for $f(z)$ via the following recurrence:

$$\begin{cases} c_0, c_1, c_2 = 0, 1, \gamma \\ c_n = \frac{c_2 c_{n-1} + \sum_{j=2}^{n-1} (-1)^{j+1} \zeta(j) c_{n-j}}{n-1} \end{cases} $$

In [134]:

import mpmath
import numpy as np
from numpy.polynomial import Polynomial
from collections import defaultdict

N = 30

mpmath.mp.dps = 50

gamma = mpmath.mp.euler
coeffs = [0, 1, gamma]
for n in range(3, N+1):
    next_coeff = (coeffs[2]*coeffs[n-1] + sum([pow(-1, j+1)*mpmath.zeta(j)*coeffs[n-j] for j in range(2,n)])) / (n-1)
    coeffs.append(next_coeff)
f_coeffs = [np.longdouble(str(i)) for i in coeffs]

a0, b0, c0, d0 = f_coeffs[:4][::-1]

min_F_seen = 10000
DIGITS = 0

def fn_to_minimize3(args):
    global min_F_seen, DIGITS
    p = list(args[::-1]) + [0 for i in range(N-3)]
    # Calculate coefficients of error
    E_coeffs = [f_i-p_i for f_i,p_i in zip(f_coeffs,p)]
    # Calculate p
    P_arr = np.zeros(2*N+1, dtype=float)
    for p in range(N+1):
        for q in range(N+1):
            P_arr[N+p-q] += E_coeffs[p]*E_coeffs[q]*(p-q)
    # Get roots of P
    p = Polynomial(P_arr)
    roots = p.roots()
    # Get max from critical points
    E = lambda z: sum([coeff*pow(z,n) for n,coeff in enumerate(E_coeffs)])
    potential_maxs = [ abs(E(r)) for r in roots if 0.95<=abs(r)<=1.05]
    out = max(potential_maxs)
    if out < min_F_seen:
        if abs(min_F_seen - out) <= pow(10, -(DIGITS+1)):
            print(f"new minimum: E({args}) = {min_F_seen}")
            DIGITS += 1
        min_F_seen = out
    return out

differential_evolution(fn_to_minimize3, 
                       bounds=[(-1.1,1.1) for i in range(4)], 
                       x0=(a0,b0,c0,d0), 
                       tol=1e-12,
                      seed=42)

new minimum: E([-0.61709254  0.63484243  1.04677988  0.01128302]) = 0.2354690226759797
new minimum: E([-0.61392552  0.60879034  1.02295489  0.01537967]) = 0.23177731011348132
new minimum: E([-0.6154001   0.61806856  1.00238181 -0.00354295]) = 0.22808068663267253
new minimum: E([-0.6059147   0.61938119  1.02002916  0.00268617]) = 0.2207281019321637
new minimum: E([-0.60481986  0.62270918  1.01769605  0.00498882]) = 0.2148275797479694
new minimum: E([-0.60376176  0.62427757  1.01880487  0.0051676 ]) = 0.21440289839988855
new minimum: E([-0.60325694  0.62537509  1.0199267   0.00562997]) = 0.21433660211622288
new minimum: E([-0.60328167  0.62532152  1.01987165  0.00560401]) = 0.21433556274375193
new minimum: E([-0.60334874  0.62520381  1.01975375  0.00553643]) = 0.2143352441053902
new minimum: E([-0.60334201  0.62521689  1.01976683  0.00554315]) = 0.21433523787921516
new minimum: E([-0.60334067  0.62521834  1.01976828  0.0055445 ]) = 0.2143352362675276
new minimum: E([-0.60334324  0.62521188  1.01976182  0.00554194]) = 0.2143352345948142
new minimum: E([-0.60334318  0.62521201  1.01976195  0.00554199]) = 0.21433523459104564
new minimum: E([-0.60334322  0.62521191  1.01976185  0.00554195]) = 0.21433523459059878

Out[134]:

     fun: 0.21433523459048126
 message: 'Optimization terminated successfully.'
    nfev: 17145
     nit: 283
 success: True
       x: array([-0.60334323,  0.6252119 ,  1.01976184,  0.00554194])

With this approach, we are able to get more than 10 digits of the answer.

Addendum¶

Note that this problem is a general case of finding the best approximation to a function in the supremum norm - in such cases, usually the Remez algorithm is used. However, I was unable to find a Python implementation of this algorithm and my own implementation of the algorithm failed to work.