Transpose, Permutations, and Orthogonality¶

One special type of matrix for which we can solve problems much more quickly is a permutation matrix, introduced as $PA=LU$ factorization.

In [1]:

# construct a permutation matrix P from the permutation vector p
function permutation_matrix(p)
    P = zeros(Int, length(p),length(p))
    for i = 1:length(p)
        P[i,p[i]] = 1
    end
    return P
end

Out[1]:

permutation_matrix (generic function with 1 method)

In [2]:

P =
permutation_matrix([2,4,1,5,3])

Out[2]:

5×5 Array{Int64,2}:
 0  1  0  0  0
 0  0  0  1  0
 1  0  0  0  0
 0  0  0  0  1
 0  0  1  0  0

In [3]:

"ONE-HOT VECTOR"

Out[3]:

"ONE-HOT VECTOR"

In [4]:

I₅ = eye(5)

Out[4]:

5×5 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  1.0

In [5]:

P * I₅

Out[5]:

5×5 Array{Float64,2}:
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0
 1.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0
 0.0  0.0  1.0  0.0  0.0

The inverse of any permutation matrix $P$ turns out to be its transpose $P^T$: we just swap rows and columns. In Julia, this is denoted P' (technically, this is the conjugate transpose, and P.' is the transpose, but the two are the same for real-number matrices where complex conjugation does nothing).

In [6]:

Out[6]:

5×5 Array{Int64,2}:
 0  1  0  0  0
 0  0  0  1  0
 1  0  0  0  0
 0  0  0  0  1
 0  0  1  0  0

In [7]:

P'*P

Out[7]:

5×5 Array{Int64,2}:
 1  0  0  0  0
 0  1  0  0  0
 0  0  1  0  0
 0  0  0  1  0
 0  0  0  0  1

In [8]:

P*P'

Out[8]:

5×5 Array{Int64,2}:
 1  0  0  0  0
 0  1  0  0  0
 0  0  1  0  0
 0  0  0  1  0
 0  0  0  0  1

The reason this works is that $P^T P$ computes the dot products of all the columns of $P$ with all of the columns, and the columns of $P$ are orthonormal (orthogonal with length 1). We say that $P$ is an example of an "orthogonal" matrix or a "unitary" matrix. We will have much to say about such matrices later in 18.06.

In [9]:

A =rand(3,5)

Out[9]:

3×5 Array{Float64,2}:
 0.656093   0.0785021  0.0874409  0.741967  0.401168 
 0.0552685  0.0664353  0.369016   0.527357  0.0764656
 0.133683   0.416092   0.52589    0.838358  0.628186

In [10]:

A'

Out[10]:

5×3 Array{Float64,2}:
 0.656093   0.0552685  0.133683
 0.0785021  0.0664353  0.416092
 0.0874409  0.369016   0.52589 
 0.741967   0.527357   0.838358
 0.401168   0.0764656  0.628186

In [11]:

function my_transpose(A)
    m,n=size(A)
    B = zeros(n,m)
    for i=1:m, j=1:n
        B[j,i]=A[i,j]
    end
    B
end

Out[11]:

my_transpose (generic function with 1 method)

In [12]:

Out[12]:

3×5 Array{Float64,2}:
 0.656093   0.0785021  0.0874409  0.741967  0.401168 
 0.0552685  0.0664353  0.369016   0.527357  0.0764656
 0.133683   0.416092   0.52589    0.838358  0.628186

In [13]:

my_transpose(A)

Out[13]:

5×3 Array{Float64,2}:
 0.656093   0.0552685  0.133683
 0.0785021  0.0664353  0.416092
 0.0874409  0.369016   0.52589 
 0.741967   0.527357   0.838358
 0.401168   0.0764656  0.628186

In [14]:

A = rand(3,3)
B = rand(3,3)
(A*B)'

Out[14]:

3×3 Array{Float64,2}:
 0.0657944  0.133402  0.10374 
 0.974459   0.506739  0.981765
 0.591419   0.472668  1.34456

In [15]:

B' * A'

Out[15]:

3×3 Array{Float64,2}:
 0.0657944  0.133402  0.10374 
 0.974459   0.506739  0.981765
 0.591419   0.472668  1.34456

Transposes and products¶

Transposes are important in linear algebra because they have a special relationship to matrix and vector products: $$ (AB)^T = B^T A^T $$ and hence for a dot product (inner product) $x^T y$ $$ x \cdot (Ay) = x \mbox{ "dot" } (Ay) = x^T (Ay) = (A^T x)^T y = (A^T x) \mbox{ "dot" } y = (A^T x) \cdot y $$ We can even turn the second step around and use this as the definition of a transpose: a transpose is what "moves" a matrix from one side to the other of a dot product.

In [16]:

x = rand(3)

Out[16]:

3-element Array{Float64,1}:
 0.289551
 0.257849
 0.976208

In [17]:

y  = rand(3)

Out[17]:

3-element Array{Float64,1}:
 0.284907
 0.180932
 0.28499

In [18]:

x ⋅ (A*y) # The transpose of A is the unique matrix such that
# x ⋅ (A*y) = (A'x)⋅y for all x and y

Out[18]:

0.6837202268882089

In [19]:

(A'x)⋅y

Out[19]:

0.6837202268882088

In [20]:

C = rand(-9:9, 4,4)
D = rand(-9:9, 4,4)
(C*D)' == D'*C'

Out[20]:

true

Transposes and inverses¶

From the above property, we have: $$ (A A^{-1})^T = (A^{-1})^T A^T = I^T = I $$ and it follows that: $$ (A^{-1})^T = (A^T)^{-1} $$ The transpose of the inverse is the inverse of the transpose.

In [21]:

A = [4  -2  -7  -4  -8
     9  -6  -6  -1  -5
    -2  -9   3  -5   2
     9   7  -9   5  -8
    -1   6  -3   9   6]

Out[21]:

5×5 Array{Int64,2}:
  4  -2  -7  -4  -8
  9  -6  -6  -1  -5
 -2  -9   3  -5   2
  9   7  -9   5  -8
 -1   6  -3   9   6

In [22]:

inv(A')

Out[22]:

5×5 Array{Float64,2}:
  0.0109991   0.131989  -0.235564  -0.301558    0.2044   
  0.529789    0.35747   -0.179652  -0.69172     0.678582 
 -0.908341   -0.900092   0.370302   1.48701    -1.29667  
 -0.635197   -0.622365   0.353804   1.16499    -1.05408  
 -0.0879927  -0.055912  -0.11549    0.0791323   0.0314696

In [23]:

inv(A)'

Out[23]:

5×5 Array{Float64,2}:
  0.0109991   0.131989  -0.235564  -0.301558    0.2044   
  0.529789    0.35747   -0.179652  -0.69172     0.678582 
 -0.908341   -0.900092   0.370302   1.48701    -1.29667  
 -0.635197   -0.622365   0.353804   1.16499    -1.05408  
 -0.0879927  -0.055912  -0.11549    0.0791323   0.0314696

As expected, they match!

Transposes and LU factors¶

If $A = LU$, then $A^T = U^T L^T$. Note that $U^T$ is lower triangular, and $L^T$ is upper trangular. That means, that once we have the LU factorization of $A$, we immediately have a similar factorization of $A^T$.

In [24]:

L,U,p = lu(A)

Out[24]:

([1.0 0.0 … 0.0 0.0; 1.0 1.0 … 0.0 0.0; … ; -0.111111 0.410256 … 1.0 0.0; -0.222222 -0.794872 … 0.0242696 1.0], [9.0 -6.0 … -1.0 -5.0; 0.0 13.0 … 6.0 -3.0; … ; 0.0 0.0 … 8.67894 9.95297; 0.0 0.0 … 0.0 -0.771206], [2, 4, 1, 5, 3])

In [25]:

Out[25]:

5-element Array{Int64,1}:
 2
 4
 1
 5
 3

In [26]:

L'

Out[26]:

5×5 Array{Float64,2}:
 1.0  1.0  0.444444   -0.111111  -0.222222 
 0.0  1.0  0.0512821   0.410256  -0.794872 
 0.0  0.0  1.0         0.582822   0.171779 
 0.0  0.0  0.0         1.0        0.0242696
 0.0  0.0  0.0         0.0        1.0

In [27]:

U'

Out[27]:

5×5 Array{Float64,2}:
  9.0   0.0   0.0      0.0       0.0     
 -6.0  13.0   0.0      0.0       0.0     
 -6.0  -3.0  -4.17949  0.0       0.0     
 -1.0   6.0  -3.86325  8.67894   0.0     
 -5.0  -3.0  -5.62393  9.95297  -0.771206

In particular, suppose we know the $PA = LU$ factorization for $A$, but we want to solve $A^T x = b$. We can:

Write $A = P^T L U \implies A^T = U^T L^T P$
Substitute this in to $A^T x = b$ to obtain $U^T L^T P x = b$
Parenthesize and solve from the "outside in": $U^T (L^T (P x)) = b$:
- First solve $U^T c = b$ for $c$ by forward-substitution
- Then solve $L^T d = c$ by backsubstitution
- Then solve $P x = d$ for $x = P^T d$ (i.e. just reversing the permutation)

Let's try it:

In [28]:

b = [4,2,1,-2,3] # "randomly" chosen right-hand side
A' \ b # correct solution to Aᵀx = b

Out[28]:

5-element Array{Float64,1}:
   1.28873 
   6.07363 
 -11.9273  
  -8.92392 
  -0.643141

In [29]:

c = U' \ b # forward-substitution
d = L' \ c # backsubstitution
permutation_matrix(p)' * d

Out[29]:

5-element Array{Float64,1}:
   1.28873 
   6.07363 
 -11.9273  
  -8.92392 
  -0.643141

As usual, the lufact(A) object (which encapsulates $L$, $U$, and $P$) does all this for you (in a more efficient way because it makes sure to take advantage of the special structure of these matrices, which we didn't above):

In [30]:

LU = lufact(A)
LU' \ b

Out[30]:

5-element Array{Float64,1}:
   1.28873 
   6.07363 
 -11.9273  
  -8.92392 
  -0.643141

Symmetric matrices¶

A very important type of matrix that arises frequently in real problems (we will have much more to say about this later in the course, after exam 2) is a symmetric matrix: a matrix $S$ that is equal to its transpose $S = S^T$.

Given any matrix $A$, we can make a symmetric matrix out of it very easily in two ways:

$A + A^T$ (or often we write the "symmetric part" of $A$ as $\frac{A + A^T}{2}$). (For square matrices only.)
$A^T A$ or $A A^T$. (This even works for non-square matrix.)

In [31]:

Out[31]:

5×5 Array{Int64,2}:
  4  -2  -7  -4  -8
  9  -6  -6  -1  -5
 -2  -9   3  -5   2
  9   7  -9   5  -8
 -1   6  -3   9   6

In [32]:

S = A' + A

Out[32]:

5×5 Array{Int64,2}:
  8    7   -9    5  -9
  7  -12  -15    6   1
 -9  -15    6  -14  -1
  5    6  -14   10   1
 -9    1   -1    1  12

In [33]:

S' == (A')'+ A' == A +A' == A' + A

Out[33]:

true

In [34]:

S = A' * A

Out[34]:

5×5 Array{Int64,2}:
  183   13  -166   21  -159
   13  206   -58  148     8
 -166  -58   184  -53   146
   21  148   -53  148    41
 -159    8   146   41   193

In [35]:

S' == (A' *A )' == A' * A

Out[35]:

true

The ordinary LU factorization of a symmetric $S$, however, seems to have nothing to do with the symmetry of $S$. Is there any special relationship between $L$ and $U$ in this case?

In [36]:

L, U = lu(S, Val{false}) # LU without pivoting

Out[36]:

([1.0 0.0 … 0.0 0.0; 0.0710383 1.0 … 0.0 0.0; … ; 0.114754 0.714408 … 1.0 0.0; -0.868852 0.0940872 … 1.11804 1.0], [183.0 13.0 … 21.0 -159.0; 0.0 205.077 … 146.508 19.2951; … ; 0.0 0.0 … 40.8852 45.7112; 0.0 0.0 … 0.0 0.303428], [1, 2, 3, 4, 5])

In [37]:

Out[37]:

5×5 Array{Float64,2}:
  1.0         0.0         0.0        0.0      0.0
  0.0710383   1.0         0.0        0.0      0.0
 -0.907104   -0.225319    1.0        0.0      0.0
  0.114754    0.714408   -0.0408412  1.0      0.0
 -0.868852    0.0940872   0.265894   1.11804  1.0

In [38]:

Out[38]:

5×5 Array{Float64,2}:
 183.0   13.0    -166.0      21.0       -159.0     
   0.0  205.077   -46.2077  146.508       19.2951  
   0.0    0.0      23.0093   -0.939727     6.11804 
   0.0    0.0       0.0      40.8852      45.7112  
   0.0    0.0       0.0       0.0          0.303428

$U$ and $L$ seem quite different because $L$ has 1's along the diagonal, but $U$ has some other numbers (the pivots). We can extract these with diag(U) in Julia:

In [39]:

diag(U)

Out[39]:

5-element Array{Float64,1}:
 183.0     
 205.077   
  23.0093  
  40.8852  
   0.303428

We could make $U$ look more like $L$ if we divided each row of $U$ by these pivots. That corresponds to multiplying $D^{-1} U$, where $D$ is the diagonal matrix of the pivots:

In [40]:

D = diagm(diag(U)) # diagm makes a diagonal matrix from a 1d array

Out[40]:

5×5 Array{Float64,2}:
 183.0    0.0     0.0      0.0     0.0     
   0.0  205.077   0.0      0.0     0.0     
   0.0    0.0    23.0093   0.0     0.0     
   0.0    0.0     0.0     40.8852  0.0     
   0.0    0.0     0.0      0.0     0.303428

Since a diagonal matrix just multiplies each row by a single number, the inverse of a diagonal matrix simply divides each row by the reciprocal of that number:

In [41]:

inv(D)

Out[41]:

5×5 Array{Float64,2}:
 0.00546448  0.0         0.0        0.0        0.0    
 0.0         0.00487623  0.0        0.0        0.0    
 0.0         0.0         0.0434607  0.0        0.0    
 0.0         0.0         0.0        0.0244587  0.0    
 0.0         0.0         0.0        0.0        3.29568

In [42]:

inv(D) * U

Out[42]:

5×5 Array{Float64,2}:
 1.0  0.0710383  -0.907104   0.114754   -0.868852 
 0.0  1.0        -0.225319   0.714408    0.0940872
 0.0  0.0         1.0       -0.0408412   0.265894 
 0.0  0.0         0.0        1.0         1.11804  
 0.0  0.0         0.0        0.0         1.0

Wait a minute, now the entries look exactly like those of $L$, except above the diagonal rather than below. In fact, this is precisely the transpose of $L$:

In [43]:

L'

Out[43]:

5×5 Array{Float64,2}:
 1.0  0.0710383  -0.907104   0.114754   -0.868852 
 0.0  1.0        -0.225319   0.714408    0.0940872
 0.0  0.0         1.0       -0.0408412   0.265894 
 0.0  0.0         0.0        1.0         1.11804  
 0.0  0.0         0.0        0.0         1.0

In [44]:

Out[44]:

5×5 Array{Int64,2}:
  183   13  -166   21  -159
   13  206   -58  148     8
 -166  -58   184  -53   146
   21  148   -53  148    41
 -159    8   146   41   193

In [45]:

L
D = abs.(D)
S =  L*abs.(D)*L'
U = L'
L

Out[45]:

5×5 Array{Float64,2}:
  1.0         0.0         0.0        0.0      0.0
  0.0710383   1.0         0.0        0.0      0.0
 -0.907104   -0.225319    1.0        0.0      0.0
  0.114754    0.714408   -0.0408412  1.0      0.0
 -0.868852    0.0940872   0.265894   1.11804  1.0

Since $D^{-1} U = L^T$, we have $U = D L^T$, and hence $S = LU = L D L^T$.

This fact is so important that it has its own name: we have constructed the LDLᵀ factorization of our symmetric matrix $S$. This factorization is useful for two reasons:

It preserves the special structure of a symmetric matrix, which is important if we are to do subsequent algebraic manipulations: $(LDL^T)^T = LDL^T$.
Clever implementations can save roughly a factor of two in the number of operations by exploiting the symmetry.

Cholesky factorization¶

Finally, we should mention another very important variation on this theme.

Suppose that we have a symmetric matrix $S$ in which all the pivots are positive. This is called a positive-definite matrix, and turns out to be the case whenever you construct $S$ from $A^T A$ or $A A^T$ (for real $A$), as above. We will have much more to say about such matrices later in the course.

In that case, we can take the square roots of the pivots to write $D = KK$ where $K$ is a diagonal matrix of the square roots of the pivots:

In [46]:

K = L* diagm(sqrt.(diag(D)))

Out[46]:

5×5 Array{Float64,2}:
  13.5277     0.0       0.0       0.0      0.0     
   0.960988  14.3205    0.0       0.0      0.0     
 -12.2711    -3.22668   4.7968    0.0      0.0     
   1.55236   10.2307   -0.195907  6.39416  0.0     
 -11.7536     1.34738   1.27544   7.14891  0.550843

In [47]:

K*K' - S # kind of a matrix sqrt: cholesky factorization

Out[47]:

5×5 Array{Float64,2}:
 0.0  0.0   0.0          0.0           0.0        
 0.0  0.0   7.10543e-15  0.0          -1.77636e-15
 0.0  0.0   2.84217e-14  7.10543e-15   0.0        
 0.0  0.0   0.0          0.0           0.0        
 0.0  0.0  -2.84217e-14  0.0           2.84217e-14

Then we can write $S = LDL^T = LKKL^T = (LK)(LK)^T$. The matrix $\hat{L} = LK$ is also a lower-triangular matrix, it is $L$ with the columns scaled by $K$. So, we can write any symmetric positive-definite (SPD) matrix as:

$$ S = \hat{L} \hat{L}^T $$

This is called the Cholesky factorization of $S$, and it usually the most efficient way to solve SPD systems (half the operations, and often half the storage, compared to LU). In Julia, it is computed by chol (which returns $\hat{L}^T$) or cholfact:

In [48]:

chol(S)

ArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling chol(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix.

Stacktrace:
 [1] chol(::Array{Float64,2}) at ./linalg/cholesky.jl:185
 [2] include_string(::String, ::String) at ./loading.jl:522
 [3] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/stevenj/.julia/v0.6/IJulia/src/execute_request.jl:193
 [4] (::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})() at /Users/stevenj/.julia/v0.6/Compat/src/Compat.jl:332
 [5] eventloop(::ZMQ.Socket) at /Users/stevenj/.julia/v0.6/IJulia/src/eventloop.jl:8
 [6] (::IJulia.##13#16)() at ./task.jl:335

Whoops, the problem was that $S = A^T A$ is supposed to be symmetric, but it is not exactly symmetric due to roundoff errors:

In [49]:

S - S'

Out[49]:

5×5 Array{Float64,2}:
 0.0   0.0           0.0           0.0           0.0        
 0.0   0.0          -7.10543e-15   0.0           1.77636e-15
 0.0   7.10543e-15   0.0          -7.10543e-15  -2.84217e-14
 0.0   0.0           7.10543e-15   0.0           0.0        
 0.0  -1.77636e-15   2.84217e-14   0.0           0.0

So, we have to tell Julia explicitly that it is symmetric, so that it should ignore these tiny differences between the upper and lower triangles:

In [50]:

chol(Symmetric(S))

Out[50]:

5×5 UpperTriangular{Float64,Array{Float64,2}}:
 13.5277   0.960988  -12.2711    1.55236   -11.7536  
   ⋅      14.3205     -3.22668  10.2307      1.34738 
   ⋅        ⋅          4.7968   -0.195907    1.27544 
   ⋅        ⋅           ⋅        6.39416     7.14891 
   ⋅        ⋅           ⋅         ⋅          0.550843

This is the same as our $K^T$ matrix from above:

In [51]:

K'

Out[51]:

5×5 Array{Float64,2}:
 13.5277   0.960988  -12.2711    1.55236   -11.7536  
  0.0     14.3205     -3.22668  10.2307      1.34738 
  0.0      0.0         4.7968   -0.195907    1.27544 
  0.0      0.0         0.0       6.39416     7.14891 
  0.0      0.0         0.0       0.0         0.550843

One interesting fact about Cholesky factorization of SPD matrices is that row swaps are never required, even when concerns about roundoff errors are included, so there is no $P$ matrix.

We won't do much with Cholesky factorizations in 18.06, but they are extremely useful in many applications of linear algebra. If your are solving a system of equations involving $A^TA$, and hence SPD, then you gain a factor of 2 or more by telling the computer to use Cholesky factorization instead of LU, and there are many other uses of Cholesky factorization as well.