One special type of matrix for which we can solve problems much more quickly is a permutation matrix, introduced as $PA=LU$ factorization.
# construct a permutation matrix P from the permutation vector p
function permutation_matrix(p)
P = zeros(Int, length(p),length(p))
for i = 1:length(p)
P[i,p[i]] = 1
end
return P
end
permutation_matrix (generic function with 1 method)
P =
permutation_matrix([2,4,1,5,3])
5×5 Array{Int64,2}: 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0
"ONE-HOT VECTOR"
"ONE-HOT VECTOR"
I₅ = eye(5)
5×5 Array{Float64,2}: 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
P * I₅
5×5 Array{Float64,2}: 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0
The inverse of any permutation matrix $P$ turns out to be its transpose $P^T$: we just swap rows and columns. In Julia, this is denoted P'
(technically, this is the conjugate transpose, and P.'
is the transpose, but the two are the same for real-number matrices where complex conjugation does nothing).
P
5×5 Array{Int64,2}: 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0
P'*P
5×5 Array{Int64,2}: 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
P*P'
5×5 Array{Int64,2}: 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
The reason this works is that $P^T P$ computes the dot products of all the columns of $P$ with all of the columns, and the columns of $P$ are orthonormal (orthogonal with length 1). We say that $P$ is an example of an "orthogonal" matrix or a "unitary" matrix. We will have much to say about such matrices later in 18.06.
A =rand(3,5)
3×5 Array{Float64,2}: 0.656093 0.0785021 0.0874409 0.741967 0.401168 0.0552685 0.0664353 0.369016 0.527357 0.0764656 0.133683 0.416092 0.52589 0.838358 0.628186
A'
5×3 Array{Float64,2}: 0.656093 0.0552685 0.133683 0.0785021 0.0664353 0.416092 0.0874409 0.369016 0.52589 0.741967 0.527357 0.838358 0.401168 0.0764656 0.628186
function my_transpose(A)
m,n=size(A)
B = zeros(n,m)
for i=1:m, j=1:n
B[j,i]=A[i,j]
end
B
end
my_transpose (generic function with 1 method)
A
3×5 Array{Float64,2}: 0.656093 0.0785021 0.0874409 0.741967 0.401168 0.0552685 0.0664353 0.369016 0.527357 0.0764656 0.133683 0.416092 0.52589 0.838358 0.628186
my_transpose(A)
5×3 Array{Float64,2}: 0.656093 0.0552685 0.133683 0.0785021 0.0664353 0.416092 0.0874409 0.369016 0.52589 0.741967 0.527357 0.838358 0.401168 0.0764656 0.628186
A = rand(3,3)
B = rand(3,3)
(A*B)'
3×3 Array{Float64,2}: 0.0657944 0.133402 0.10374 0.974459 0.506739 0.981765 0.591419 0.472668 1.34456
B' * A'
3×3 Array{Float64,2}: 0.0657944 0.133402 0.10374 0.974459 0.506739 0.981765 0.591419 0.472668 1.34456
Transposes are important in linear algebra because they have a special relationship to matrix and vector products: $$ (AB)^T = B^T A^T $$ and hence for a dot product (inner product) $x^T y$ $$ x \cdot (Ay) = x \mbox{ "dot" } (Ay) = x^T (Ay) = (A^T x)^T y = (A^T x) \mbox{ "dot" } y = (A^T x) \cdot y $$ We can even turn the second step around and use this as the definition of a transpose: a transpose is what "moves" a matrix from one side to the other of a dot product.
x = rand(3)
3-element Array{Float64,1}: 0.289551 0.257849 0.976208
y = rand(3)
3-element Array{Float64,1}: 0.284907 0.180932 0.28499
x ⋅ (A*y) # The transpose of A is the unique matrix such that
# x ⋅ (A*y) = (A'x)⋅y for all x and y
0.6837202268882089
(A'x)⋅y
0.6837202268882088
C = rand(-9:9, 4,4)
D = rand(-9:9, 4,4)
(C*D)' == D'*C'
true
From the above property, we have: $$ (A A^{-1})^T = (A^{-1})^T A^T = I^T = I $$ and it follows that: $$ (A^{-1})^T = (A^T)^{-1} $$ The transpose of the inverse is the inverse of the transpose.
A = [4 -2 -7 -4 -8
9 -6 -6 -1 -5
-2 -9 3 -5 2
9 7 -9 5 -8
-1 6 -3 9 6]
5×5 Array{Int64,2}: 4 -2 -7 -4 -8 9 -6 -6 -1 -5 -2 -9 3 -5 2 9 7 -9 5 -8 -1 6 -3 9 6
inv(A')
5×5 Array{Float64,2}: 0.0109991 0.131989 -0.235564 -0.301558 0.2044 0.529789 0.35747 -0.179652 -0.69172 0.678582 -0.908341 -0.900092 0.370302 1.48701 -1.29667 -0.635197 -0.622365 0.353804 1.16499 -1.05408 -0.0879927 -0.055912 -0.11549 0.0791323 0.0314696
inv(A)'
5×5 Array{Float64,2}: 0.0109991 0.131989 -0.235564 -0.301558 0.2044 0.529789 0.35747 -0.179652 -0.69172 0.678582 -0.908341 -0.900092 0.370302 1.48701 -1.29667 -0.635197 -0.622365 0.353804 1.16499 -1.05408 -0.0879927 -0.055912 -0.11549 0.0791323 0.0314696
As expected, they match!
If $A = LU$, then $A^T = U^T L^T$. Note that $U^T$ is lower triangular, and $L^T$ is upper trangular. That means, that once we have the LU factorization of $A$, we immediately have a similar factorization of $A^T$.
L,U,p = lu(A)
([1.0 0.0 … 0.0 0.0; 1.0 1.0 … 0.0 0.0; … ; -0.111111 0.410256 … 1.0 0.0; -0.222222 -0.794872 … 0.0242696 1.0], [9.0 -6.0 … -1.0 -5.0; 0.0 13.0 … 6.0 -3.0; … ; 0.0 0.0 … 8.67894 9.95297; 0.0 0.0 … 0.0 -0.771206], [2, 4, 1, 5, 3])
p
5-element Array{Int64,1}: 2 4 1 5 3
L'
5×5 Array{Float64,2}: 1.0 1.0 0.444444 -0.111111 -0.222222 0.0 1.0 0.0512821 0.410256 -0.794872 0.0 0.0 1.0 0.582822 0.171779 0.0 0.0 0.0 1.0 0.0242696 0.0 0.0 0.0 0.0 1.0
U'
5×5 Array{Float64,2}: 9.0 0.0 0.0 0.0 0.0 -6.0 13.0 0.0 0.0 0.0 -6.0 -3.0 -4.17949 0.0 0.0 -1.0 6.0 -3.86325 8.67894 0.0 -5.0 -3.0 -5.62393 9.95297 -0.771206
In particular, suppose we know the $PA = LU$ factorization for $A$, but we want to solve $A^T x = b$. We can:
Let's try it:
b = [4,2,1,-2,3] # "randomly" chosen right-hand side
A' \ b # correct solution to Aᵀx = b
5-element Array{Float64,1}: 1.28873 6.07363 -11.9273 -8.92392 -0.643141
c = U' \ b # forward-substitution
d = L' \ c # backsubstitution
permutation_matrix(p)' * d
5-element Array{Float64,1}: 1.28873 6.07363 -11.9273 -8.92392 -0.643141
As usual, the lufact(A)
object (which encapsulates $L$, $U$, and $P$) does all this for you (in a more efficient way because it makes sure to take advantage of the special structure of these matrices, which we didn't above):
LU = lufact(A)
LU' \ b
5-element Array{Float64,1}: 1.28873 6.07363 -11.9273 -8.92392 -0.643141
A very important type of matrix that arises frequently in real problems (we will have much more to say about this later in the course, after exam 2) is a symmetric matrix: a matrix $S$ that is equal to its transpose $S = S^T$.
Given any matrix $A$, we can make a symmetric matrix out of it very easily in two ways:
A
5×5 Array{Int64,2}: 4 -2 -7 -4 -8 9 -6 -6 -1 -5 -2 -9 3 -5 2 9 7 -9 5 -8 -1 6 -3 9 6
S = A' + A
5×5 Array{Int64,2}: 8 7 -9 5 -9 7 -12 -15 6 1 -9 -15 6 -14 -1 5 6 -14 10 1 -9 1 -1 1 12
S' == (A')'+ A' == A +A' == A' + A
true
S = A' * A
5×5 Array{Int64,2}: 183 13 -166 21 -159 13 206 -58 148 8 -166 -58 184 -53 146 21 148 -53 148 41 -159 8 146 41 193
S' == (A' *A )' == A' * A
true
The ordinary LU factorization of a symmetric $S$, however, seems to have nothing to do with the symmetry of $S$. Is there any special relationship between $L$ and $U$ in this case?
L, U = lu(S, Val{false}) # LU without pivoting
([1.0 0.0 … 0.0 0.0; 0.0710383 1.0 … 0.0 0.0; … ; 0.114754 0.714408 … 1.0 0.0; -0.868852 0.0940872 … 1.11804 1.0], [183.0 13.0 … 21.0 -159.0; 0.0 205.077 … 146.508 19.2951; … ; 0.0 0.0 … 40.8852 45.7112; 0.0 0.0 … 0.0 0.303428], [1, 2, 3, 4, 5])
L
5×5 Array{Float64,2}: 1.0 0.0 0.0 0.0 0.0 0.0710383 1.0 0.0 0.0 0.0 -0.907104 -0.225319 1.0 0.0 0.0 0.114754 0.714408 -0.0408412 1.0 0.0 -0.868852 0.0940872 0.265894 1.11804 1.0
U
5×5 Array{Float64,2}: 183.0 13.0 -166.0 21.0 -159.0 0.0 205.077 -46.2077 146.508 19.2951 0.0 0.0 23.0093 -0.939727 6.11804 0.0 0.0 0.0 40.8852 45.7112 0.0 0.0 0.0 0.0 0.303428
$U$ and $L$ seem quite different because $L$ has 1's along the diagonal, but $U$ has some other numbers (the pivots). We can extract these with diag(U)
in Julia:
diag(U)
5-element Array{Float64,1}: 183.0 205.077 23.0093 40.8852 0.303428
We could make $U$ look more like $L$ if we divided each row of $U$ by these pivots. That corresponds to multiplying $D^{-1} U$, where $D$ is the diagonal matrix of the pivots:
D = diagm(diag(U)) # diagm makes a diagonal matrix from a 1d array
5×5 Array{Float64,2}: 183.0 0.0 0.0 0.0 0.0 0.0 205.077 0.0 0.0 0.0 0.0 0.0 23.0093 0.0 0.0 0.0 0.0 0.0 40.8852 0.0 0.0 0.0 0.0 0.0 0.303428
Since a diagonal matrix just multiplies each row by a single number, the inverse of a diagonal matrix simply divides each row by the reciprocal of that number:
inv(D)
5×5 Array{Float64,2}: 0.00546448 0.0 0.0 0.0 0.0 0.0 0.00487623 0.0 0.0 0.0 0.0 0.0 0.0434607 0.0 0.0 0.0 0.0 0.0 0.0244587 0.0 0.0 0.0 0.0 0.0 3.29568
inv(D) * U
5×5 Array{Float64,2}: 1.0 0.0710383 -0.907104 0.114754 -0.868852 0.0 1.0 -0.225319 0.714408 0.0940872 0.0 0.0 1.0 -0.0408412 0.265894 0.0 0.0 0.0 1.0 1.11804 0.0 0.0 0.0 0.0 1.0
Wait a minute, now the entries look exactly like those of $L$, except above the diagonal rather than below. In fact, this is precisely the transpose of $L$:
L'
5×5 Array{Float64,2}: 1.0 0.0710383 -0.907104 0.114754 -0.868852 0.0 1.0 -0.225319 0.714408 0.0940872 0.0 0.0 1.0 -0.0408412 0.265894 0.0 0.0 0.0 1.0 1.11804 0.0 0.0 0.0 0.0 1.0
S
5×5 Array{Int64,2}: 183 13 -166 21 -159 13 206 -58 148 8 -166 -58 184 -53 146 21 148 -53 148 41 -159 8 146 41 193
L
D = abs.(D)
S = L*abs.(D)*L'
U = L'
L
5×5 Array{Float64,2}: 1.0 0.0 0.0 0.0 0.0 0.0710383 1.0 0.0 0.0 0.0 -0.907104 -0.225319 1.0 0.0 0.0 0.114754 0.714408 -0.0408412 1.0 0.0 -0.868852 0.0940872 0.265894 1.11804 1.0
Since $D^{-1} U = L^T$, we have $U = D L^T$, and hence $S = LU = L D L^T$.
This fact is so important that it has its own name: we have constructed the LDLᵀ factorization of our symmetric matrix $S$. This factorization is useful for two reasons:
It preserves the special structure of a symmetric matrix, which is important if we are to do subsequent algebraic manipulations: $(LDL^T)^T = LDL^T$.
Clever implementations can save roughly a factor of two in the number of operations by exploiting the symmetry.
Finally, we should mention another very important variation on this theme.
Suppose that we have a symmetric matrix $S$ in which all the pivots are positive. This is called a positive-definite matrix, and turns out to be the case whenever you construct $S$ from $A^T A$ or $A A^T$ (for real $A$), as above. We will have much more to say about such matrices later in the course.
In that case, we can take the square roots of the pivots to write $D = KK$ where $K$ is a diagonal matrix of the square roots of the pivots:
K = L* diagm(sqrt.(diag(D)))
5×5 Array{Float64,2}: 13.5277 0.0 0.0 0.0 0.0 0.960988 14.3205 0.0 0.0 0.0 -12.2711 -3.22668 4.7968 0.0 0.0 1.55236 10.2307 -0.195907 6.39416 0.0 -11.7536 1.34738 1.27544 7.14891 0.550843
K*K' - S # kind of a matrix sqrt: cholesky factorization
5×5 Array{Float64,2}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 7.10543e-15 0.0 -1.77636e-15 0.0 0.0 2.84217e-14 7.10543e-15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -2.84217e-14 0.0 2.84217e-14
Then we can write $S = LDL^T = LKKL^T = (LK)(LK)^T$. The matrix $\hat{L} = LK$ is also a lower-triangular matrix, it is $L$ with the columns scaled by $K$. So, we can write any symmetric positive-definite (SPD) matrix as:
$$ S = \hat{L} \hat{L}^T $$This is called the Cholesky factorization of $S$, and it usually the most efficient way to solve SPD systems (half the operations, and often half the storage, compared to LU). In Julia, it is computed by chol
(which returns $\hat{L}^T$) or cholfact
:
chol(S)
ArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling chol(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix. Stacktrace: [1] chol(::Array{Float64,2}) at ./linalg/cholesky.jl:185 [2] include_string(::String, ::String) at ./loading.jl:522 [3] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/stevenj/.julia/v0.6/IJulia/src/execute_request.jl:193 [4] (::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})() at /Users/stevenj/.julia/v0.6/Compat/src/Compat.jl:332 [5] eventloop(::ZMQ.Socket) at /Users/stevenj/.julia/v0.6/IJulia/src/eventloop.jl:8 [6] (::IJulia.##13#16)() at ./task.jl:335
Whoops, the problem was that $S = A^T A$ is supposed to be symmetric, but it is not exactly symmetric due to roundoff errors:
S - S'
5×5 Array{Float64,2}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -7.10543e-15 0.0 1.77636e-15 0.0 7.10543e-15 0.0 -7.10543e-15 -2.84217e-14 0.0 0.0 7.10543e-15 0.0 0.0 0.0 -1.77636e-15 2.84217e-14 0.0 0.0
So, we have to tell Julia explicitly that it is symmetric, so that it should ignore these tiny differences between the upper and lower triangles:
chol(Symmetric(S))
5×5 UpperTriangular{Float64,Array{Float64,2}}: 13.5277 0.960988 -12.2711 1.55236 -11.7536 ⋅ 14.3205 -3.22668 10.2307 1.34738 ⋅ ⋅ 4.7968 -0.195907 1.27544 ⋅ ⋅ ⋅ 6.39416 7.14891 ⋅ ⋅ ⋅ ⋅ 0.550843
This is the same as our $K^T$ matrix from above:
K'
5×5 Array{Float64,2}: 13.5277 0.960988 -12.2711 1.55236 -11.7536 0.0 14.3205 -3.22668 10.2307 1.34738 0.0 0.0 4.7968 -0.195907 1.27544 0.0 0.0 0.0 6.39416 7.14891 0.0 0.0 0.0 0.0 0.550843
One interesting fact about Cholesky factorization of SPD matrices is that row swaps are never required, even when concerns about roundoff errors are included, so there is no $P$ matrix.
We won't do much with Cholesky factorizations in 18.06, but they are extremely useful in many applications of linear algebra. If your are solving a system of equations involving $A^TA$, and hence SPD, then you gain a factor of 2 or more by telling the computer to use Cholesky factorization instead of LU, and there are many other uses of Cholesky factorization as well.