Notebook

18.335 pset 2 solutions¶

problem 2b¶

In the problem, you proved that $\Vert B \Vert_p \le \Vert A \Vert_p$, where $B$ is any submatrix of $A$, for any induced p-norm (p ≥ 1).

I asked you to check your result in Julia — it is a good habit to try out matrix properties on random matrices, both to check for mistakes and to get used to matrix calculations.

In [1]:

A = rand(10,7)

Out[1]:

10×7 Array{Float64,2}:
 0.172788  0.240478  0.34862   0.0701199  0.0667866  0.224837   0.466168
 0.229774  0.396172  0.70968   0.0617452  0.94947    0.0774769  0.714809
 0.051449  0.977165  0.873841  0.0502234  0.833103   0.171178   0.8879  
 0.985075  0.771922  0.791837  0.679997   0.951395   0.406127   0.820381
 0.989578  0.677454  0.960865  0.992694   0.52531    0.770805   0.201112
 0.881249  0.784332  0.725263  0.381246   0.908879   0.570349   0.343798
 0.20466   0.858549  0.893738  0.334795   0.976062   0.706054   0.426582
 0.890724  0.996904  0.723399  0.988638   0.555659   0.39784    0.870908
 0.819373  0.833039  0.223788  0.163779   0.925095   0.549572   0.467481
 0.369941  0.583348  0.474652  0.847811   0.594631   0.718061   0.606008

In [2]:

using LinearAlgebra # for opnorm

B = A[2:5, 3:7] # a submatrix

# the ratio ‖B‖/‖A‖ for the induced p=1,2,∞ norms:
[opnorm(B,p) / opnorm(A,p) for p in (1,2,Inf)]

Out[2]:

3-element Array{Float64,1}:
 0.4578705281117504
 0.5597020995891572
 0.6728779041971168

Yay, the ratio is ≤ 1 as desired.

It is okay if you just checked the p=2 norm for a single submatrix.

If we want, we could compute the ratio for all possible non-empty submatrices for p=2:

In [3]:

ratio = Float64[]
for a = 1:10, b = a:10, c = 1:7, d = c:7
    push!(ratio, opnorm(A[a:b, c:d]) / opnorm(A))
end
minimum(ratio), maximum(ratio)

Out[3]:

(0.009621631263303159, 1.0)

So, $0.00962 \le \Vert B \Vert_2 / \Vert A \Vert_2 \le 1$ for all possible submatrices of this $A$, consistent with your proved result.

problem 4 (Trefethen 11.2)¶

Here, we want to minimize $$ \int_a^b \left|\frac{1}{x} - f(x)\right|^2 dx $$ for an interval $(a,b) = (1,2)$ in part (a) and $(a,b) = (0,1)$ in part (b), over all possible coefficients c of $$ f(x) = c_1 e^x + c_2 \sin(x) + c_3 \Gamma(x) . $$ to at least 1% accuracy.

There are several ways to approach this problem, depending on how we want to compute the integrals.

On a computer, we typically need to approximate an integral $\int_a^b g(x)dx$ by a quadrature rule of the form: $$ \int_a^b g(x)dx \approx \sum_{k=1}^m w_k g(x_k) $$ where $x_k$ are the $m$ quadrature points and $w_k$ are the quadrature weights. We will have more to say on this subject later in 18.06, but for now let's just use the simplest possible quadrature rule — the one that most of you learned when integration was first defined to you, a Riemann sum: $$ \int_a^b g(x)dx \approx \sum_{k=1}^m g(a + i\Delta x) \Delta x, $$ where $\Delta x = (b - a)/N$. That is, we just approximate the integral as a sum of rectangles. If we choose a large enough $N$, this is sufficiently accurate.

Once we do this, the functional minimization problem becomes an ordinary matrix least-squares problem: $$ \int_a^b \left|\frac{1}{x} - f(x)\right|^2 dx = \Vert Ac - b\Vert_2^2 (\Delta x)^2 $$ where $c = (c_1,c_2,c_3)$ (column vector), $$ A = \begin{pmatrix} e^x_1 & \sin(x_1) & \Gamma(x_1) \\ e^x_2 & \sin(x_2) & \Gamma(x_2) \\ \vdots & \vdots & \vdots \\ e^x_m & \sin(x_m) & \Gamma(x_m) \end{pmatrix}, b = \begin{pmatrix} 1/x_1 \\ 1/x_2 \\ \vdots 1/x_m \end{pmatrix}, $$ with $x_k = a + i\Delta x$.

We can then solve this with c = A \ b in Julia (which uses a variant of Householder QR).

Let's write a function to construct this solution given $a,b,m$, so that we can plug in different values easily:

In [9]:

using SpecialFunctions # for the Γ(x) function gamma(x)

# the fit coefficients c for a given interval (a,b) and a number m of Riemann-sum points
function cfit(a,b,m)
    Δx = (b-a)/m
    x = a .+ (1:m)*Δx
    A = [exp.(x) sin.(x) gamma.(x)]
    b = 1 ./ x
    return A \ b # the least-square coefficients
end

Out[9]:

cfit (generic function with 1 method)

part (a)¶

Here, $(a,b) = (1,2)$. Let's try evaluating our cfit function above for various numbers of points m from 10 to $10^6$ and see the convergence:

In [11]:

[cfit(1,2,m) for m in 10 .^ (1:6)]

Out[11]:

6-element Array{Array{Float64,1},1}:
 [-0.1078660461660563, 0.004992125477159778, 1.292160813588242]  
 [-0.10777942089180041, 0.008771206004915781, 1.287661305514134] 
 [-0.1077763553565129, 0.009165342313085053, 1.2872236707205442] 
 [-0.10777611040978795, 0.009204892920088487, 1.2871800783321041]
 [-0.10777608653642387, 0.009208849324636559, 1.287175720849681] 
 [-0.10777608415530587, 0.009209244978506566, 1.2871752851190472]

Great, it seems to be converging!

In particular, it is converging to $\boxed{c \approx (-0.1078, 0.009209, 1.2872)}$.

(A more careful analysis shows that the Riemann-sum approximation has an error that converges as $O(1/m)$. There are much better ways to do numerical integration!)

part (b)¶

Here, $(a,b) = (0,1)$. Let's try our cfit function again:

In [12]:

[cfit(0,1,m) for m in 10 .^ (1:6)]

Out[12]:

6-element Array{Array{Float64,1},1}:
 [0.6508183359559631, -1.925841393331335, 0.9999015363513064] 
 [0.6498620853519183, -1.8963532983984785, 0.9989437018833396]
 [0.6309184107226293, -1.8337108675743912, 0.9998609606479877]
 [0.6269075397198692, -1.8204787595543002, 0.9999805673969226]
 [0.6261104248573652, -1.8178504788404921, 0.999997411674164] 
 [0.6259715730673621, -1.8173922750759735, 0.9999996737972219]

Again, it seems to be converging, this time to $\boxed{c \approx (0.626, -1.817, 1.0)}$.

This is actually the right answer. But something funny is going on here, and it is worth our time to think about it more carefully.

Our function $1/x$ diverges as $x \to 0^+$, and worse, its integral diverges. That means that we will get $$ \int_a^b \left|\frac{1}{x} - f(x)\right|^2 = \infty $$ unless our fit function $f(x)$ exactly cancels the divergence.

$e^x$ and $\sin(x)$ are not diverging, but $\Gamma(x)$ does diverge. Trefethen's hint tells you that $\Gamma(x)^{-1}$ has slope 1 at $x=0$, which means $\Gamma(x)^{-1} = x + o(x)$, i.e. $\Gamma(x) = 1/x + o(x)/x = 1/x + \mbox{finite}$. So, $\Gamma(x)$ has exactly the divergence that we need, and even the right coefficient, so for $a=0$ we must have exactly $\boxed{c_3 = 1}$, which is what we were getting numerically above.

It's worth digging deeper — why didn't our numerics run into trouble?

We got "lucky", because I chose a quadrature rule that didn't evaluate $f(0)$ — the smallest $x$ I evaluated was $f(\Delta x)$. Even so, our A matrix is becoming badly conditioned as we increase $m$, since the third column is blowing up. We can compute this by calling cond(A) to find the condition number:

In [14]:

function cfit_cond(a,b,m)
    Δx = (b-a)/m
    x = a .+ (1:m)*Δx
    A = [exp.(x) sin.(x) gamma.(x)]
    cond(A)
end
[cfit_cond(0,1,m) for m in 10 .^ (1:6)]

Out[14]:

6-element Array{Float64,1}:
   128.2268245943078 
   146.28792430783017
   379.08689306753394
  1146.0959245137424 
  3592.3712785121734 
 11342.979990053122

Luckly, double precision covered for our "sins". Even for $m=10^6$, we get a condition number of about $10^4$, which is not enough to prevent us from achieving 1% accuracy.

Alternatively, we could have set $c_3 = 1$ analytically and done a two-parameter fit by moving the $\Gamma(x)$ term to the right-hand side:

In [18]:

# fit function for a=0 to b, in which gamma coefficient is required to be 1:
function cfit(b,m)
    a = 0
    Δx = (b-a)/m
    x = a .+ (1:m)*Δx
    A = [exp.(x) sin.(x) ]    # matrix for 2-parameter fit
    @show cond(A)
    b = 1 ./ x .- gamma.(x)   # Γ(x) term is moved to right-hand side
    return [A \ b; 1] # the least-square coefficients and c₃=1
end
[cfit(1,m) for m in 10 .^ (1:6)]

cond(A) = 20.19153545232387
cond(A) = 16.75382995902203
cond(A) = 16.46483632485854
cond(A) = 16.436400349836358
cond(A) = 16.433561315633575
cond(A) = 16.433277457776853

Out[18]:

6-element Array{Array{Float64,1},1}:
 [0.6498926402534453, -1.9230419083576036, 1.0]
 [0.627952732095168, -1.8266517509709703, 1.0] 
 [0.6261437336323719, -1.8182329520772051, 1.0]
 [0.6259660174699866, -1.8174011087506348, 1.0]
 [0.6259482771509939, -1.8173180230331019, 1.0]
 [0.6259465034315377, -1.817309715445783, 1.0]

It's still converging to the same $\boxed{c \approx (0.626, -1.817, 1.0)}$ result as before (where the 1.0 is no longer a fit degree of freedom), but in this case our matrix is well-conditioned.

Comment: computing $\frac{1}{x} - \Gamma(x)$¶

Even for the "clever" solution where we analytically set $c_3 = 1$, we still need to compute $\frac{1}{x} - \Gamma(x)$ for the right-hand side vector at $x = \Delta x, 2\Delta x, \ldots$. This difference is susceptible to catastrophic cancellation if $x$ is very small:

In [19]:

[1/x - gamma(x) for x in 0.1 .^ (1:20)] # compute 1/x - Γ(x) for x = 0.1,0.01,…,1e-20

Out[19]:

20-element Array{Float64,1}:
 0.4864923013312694
 0.5674148808493982
 0.5762275154045255
 0.5771167683760723
 0.5772057744325139
 0.577214675839059 
 0.5772155653685331
 0.5772156566381454
 0.5772156715393066
 0.5772151947021484
 0.57720947265625  
 0.5771484375      
 0.578125          
 0.578125          
 0.5               
 2.0               
 0.0               
 0.0               
 0.0               
 0.0

Notice how the answer is simply 0.0 for $x \lesssim 10^{-17}$ — all of the significant digits have cancelled, even though the exact answer is: $$ \lim_{x\to 0^+} \left[ \frac{1}{x} - \Gamma(x) \right] = \gamma \approx 0.57721566490153286060651209\ldots $$ ($\gamma$ is the Euler–Mascheroni constant).

As usual, what saved us is double precision — we only went up to $m=10^6$ or $\Delta x = 10^{-6}$, for which cancellation "only" loses 6 digits … no problem for 1% accuracy.

But if we wanted to go to even greater accuracy, we would have a problem. In order to get around this, we would need a better way to compute the function $1/x - \Gamma(x)$, much like the cotangent difference you considered in pset 1. One possibility would be to switch to a series expansion for small $x$: $$ \frac{1}{x} - \Gamma(x) = \gamma - \frac{1}{6}\left(3\gamma^2 + \frac{\pi^2}{2}\right)x + \cdots $$ This is always a bit tricky to implement — you have to decide for what $|x|$ you switch to the series, and how many terms to keep, in order to maintain a given precision. Many special functions are implemented in this way, stitching together various polynomial and rational-function approximations in different portions of their domain, but it is rather delicate to get right.

Sometimes, one can exploit various identities in order to rearrange a formula like this into something more stable. I'm not seeing anything at first glance. In any case, you weren't expected to deal with this (since you only needed 1% accuracy).

Comment: Least-squares with integrals¶

It is also interesting to think about how we would solve the "exact" integral least-squares problem. One option is to forget everything we've ever learned about linear algebra and just take partial derivatives of the integral with respect to parameters, but that's no fun. Instead, let's think of the "matrix" $A$ as the linear operator that takes coefficients $c = (c_1,c_2,c_3)$ and gives you a function $f(x)$: $$ Ac = c_1 e^x + c_2 \sin(x) + c_3 \Gamma(x) . $$ This is like a "matrix with infinitely many rows", since it maps $c\in \mathbb{C}^4$ into a space of functions.

Now, suppose we know that minimizing $\Vert Ac-b \Vert_2$ leads to the normal equations $A^* A c = A^* b$, but what is $A^*$? The key is to realize that we are implicitly using an inner product ("dot product") given by $$f \cdot g = \int_a^b \bar{f}(x) g(x)dx$$, from which we have our $L_2$ norm $\Vert f \Vert^2 = \int_a^b |f(x)|^2 dx$. In this sense, $A^*$ is the dual linear operator that, given a function $f$, computes the "dot" product with the columns: $$ A^* f = \begin{pmatrix} e^x \cdot f(x) \\ \sin(x) \cdot f(x) \\ \Gamma(x) \cdot f(x) \end{pmatrix} . $$ All of our algebra for least-squares still works if we define $A^*$ in this way! So, we end up with $3 \times 3$ normal equations: $$ \underbrace{\begin{pmatrix} e^x \cdot e^x & e^x \cdot \sin(x) & e^x \cdot \Gamma(x) \\ \sin(x) \cdot e^x & \sin(x) \cdot \sin(x) & \sin(x) \cdot \Gamma(x) \\ \Gamma(x) \cdot e^x & \Gamma(x) \cdot \sin(x) & \Gamma(x) \cdot \Gamma(x) \end{pmatrix}}_{A^* A} c = \underbrace{\begin{pmatrix} e^x \cdot \frac{1}{x} \\ \sin(x) \cdot \frac{1}{x} \\ \Gamma(x) \cdot \frac{1}{x} \end{pmatrix}}_{A^* (1/x)} $$

Now that we've formulated it this way, we still need to compute the integrals, but we can compute them any way we like. For example, we can compute them using the QuadGK package for Julia, which uses fancy "adaptive" methods that use a different number of points depending what the function is doing. The main point is, we know need to know precisely how the numerical integration works, since it all happens "inside" $A^* A$:

In [23]:

using QuadGK # for the quadgk numerical integration function

function cfit2(a,b; rtol=1e-12) # continuous-x fit, doing integrals to relative tolerance rtol
    A = [exp,sin,gamma]
    AᵀA = [quadgk(x -> f(x)*g(x), a,b, rtol=rtol)[1] for f in A, g in A]
    Aᵀb = [quadgk(x -> f(x)/x, a,b, rtol=rtol)[1] for f in A]
    display(AᵀA)
    display(Aᵀb)
    return AᵀA \ Aᵀb # the least-square coefficients
end

cfit2(1,2)

3×3 Array{Float64,2}:
 23.6045   4.48756   4.32093 
  4.48756  0.916525  0.881419
  4.32093  0.881419  0.852602

3-element Array{Float64,1}:
 3.059116539645953 
 0.6593299064355118
 0.6398719692098005

Out[23]:

3-element Array{Float64,1}:
 -0.10777608389080662 
  0.009209288940199732
  1.2871752367047251

Before, our results seemed to be converging to $\boxed{c \approx (-0.1078, 0.009209, 1.2872)}$, and now we see that indeed this is correct for the "exact" integrals (to 10+ digits) as well.

If we try to do the $\int_0^1$ fit this way, however, we will have a problem because the integrals involving $\Gamma(x)$ and $1/x$ no longer converge:

In [24]:

cfit2(0,1)

DomainError with 3.5601181736115222e-307:
integrand produced Inf in the interval (0.0, 7.120236347223045e-307)

Stacktrace:
 [1] evalrule(::var"#47#51"{typeof(gamma),typeof(exp)}, ::Float64, ::Float64, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::typeof(norm)) at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/evalrule.jl:41
 [2] adapt(::Function, ::Array{QuadGK.Segment{Float64,Float64,Float64},1}, ::Float64, ::Float64, ::Int64, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::Float64, ::Int64, ::typeof(norm)) at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/adapt.jl:38
 [3] do_quadgk(::var"#47#51"{typeof(gamma),typeof(exp)}, ::Tuple{Int64,Int64}, ::Int64, ::Nothing, ::Float64, ::Int64, ::typeof(norm)) at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/adapt.jl:28
 [4] (::QuadGK.var"#28#29"{Nothing,Float64,Int64,Int64,typeof(norm)})(::Function, ::Tuple{Int64,Int64}, ::Function) at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/adapt.jl:159
 [5] handle_infinities at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/adapt.jl:93 [inlined]
 [6] #quadgk#27 at /Users/stevenj/.julia/packages/QuadGK/jmDk8/src/adapt.jl:157 [inlined]
 [7] (::QuadGK.var"#kw##quadgk")(::NamedTuple{(:rtol,),Tuple{Float64}}, ::typeof(quadgk), ::Function, ::Int64, ::Int64) at ./none:0
 [8] (::var"#46#50"{Float64,Int64,Int64})(::Tuple{typeof(gamma),typeof(exp)}) at ./none:0
 [9] iterate at ./generator.jl:47 [inlined]
 [10] collect_to!(::Array{Float64,2}, ::Base.Generator{Base.Iterators.ProductIterator{Tuple{Array{Function,1},Array{Function,1}}},var"#46#50"{Float64,Int64,Int64}}, ::Int64, ::Tuple{Tuple{typeof(exp),Int64},Tuple{typeof(exp),Int64}}) at ./array.jl:667
 [11] collect_to_with_first!(::Array{Float64,2}, ::Float64, ::Base.Generator{Base.Iterators.ProductIterator{Tuple{Array{Function,1},Array{Function,1}}},var"#46#50"{Float64,Int64,Int64}}, ::Tuple{Tuple{typeof(exp),Int64},Tuple{typeof(exp),Int64}}) at ./array.jl:646
 [12] collect(::Base.Generator{Base.Iterators.ProductIterator{Tuple{Array{Function,1},Array{Function,1}}},var"#46#50"{Float64,Int64,Int64}}) at ./array.jl:627
 [13] #cfit2#45(::Float64, ::typeof(cfit2), ::Int64, ::Int64) at ./In[23]:5
 [14] cfit2(::Int64, ::Int64) at ./In[23]:4
 [15] top-level scope at In[24]:1

In this case, we must set $c_3=1$ analytically:

In [26]:

# special case for a=0 where c₃=1
function cfit2(b; rtol=1e-12) # continuous-x fit, doing integrals to relative tolerance rtol
    a = 0
    A = [exp,sin] # Γ(x) is moved to right-hand side
    AᵀA = [quadgk(x -> f(x)*g(x), a,b, rtol=rtol)[1] for f in A, g in A]
    Aᵀb = [quadgk(x -> f(x)*(1/x-gamma(x)), a,b, rtol=rtol)[1] for f in A]
    display(AᵀA)
    display(Aᵀb)
    return [AᵀA \ Aᵀb; 1] # the least-square coefficients including c₃=1
end

cfit2(1)

2×2 Array{Float64,2}:
 3.19453   0.909331
 0.909331  0.272676

2-element Array{Float64,1}:
 0.34706840472875855
 0.0736563323865726

Out[26]:

3-element Array{Float64,1}:
  0.6259463063551046
 -1.8173087923915738
  1.0

Again, these results are consistent with the $\boxed{c \approx (0.626, -1.817, 1.0)}$ results that we obtained in part (b). Again, we see that our $2\times 2$ matrix $A^*A$ is well-conditioned.

(Our accuracy might still be limited by the $A^* (1/x - \Gamma(x))$ computation because of cancellation error, as noted above, unless we find a better implementation of this special function. However, we are helped by the fact that quadgk never evaluates the integrand exactly at its endpoints, and it uses a very fast-converging quadrature algorithm that doesn't even get particularly close to the endpoints for smooth integrands.)