Lots of you seem to have learned how to multiply matrices (matrix multiplication) in high school.
We compute the product $C=AB$ of an $m \times n$ matrix $A$ with an $n \times p$ matrix $B$ to produce an $m \times p$ matrix $C$.
Did you ever wonder why "matmul" has such a fancy definition?
When we add matrices we add elements. Why coudn't matmul be just as easy?
Of course the elementwise multiply is doable but never seems to be quite as important:
(I'll bet your high school teacher never mentioned elementwise multiply!)
A=[1 2
3 4]
B=[7 2
3 2]
@show(A .* B) # Elementwise times is the "dot star"
@show(A * B); # Matmul is just the "star"
A .* B = [7 4; 9 8] A * B = [13 6; 33 14]
For square n x n matrices, elementwise multiply requires $n^2$ operations, while matmul requires about $2n^3$. (Think $n^2$ dot products, each requiring $n$ mults and almost $n$ adds.)
Why is matmul defined this way? We will find out later in the course when we begin to understand that a matrix represents a linear transformation, and matmul is the natural representation of the composition of transformations. It is only then you can understand the true nature of matrix multiplication. (Bet your high school teacher never told you that!)
One of our goals in 18.06 is to sometimes stop thinking of matrices as arrays of numbers, and more as holistic objects.
Abstractly, the rules for matrix multiplication are determined once you define how to multiply matrices by vectors $Ax$, the central linear operation of 18.06, by requiring that multiplication be associative. That is, we require: $$ A(Bx)=(AB)x $$ for all matrices $A$ and $B$ and all vectors $x$. The expression $A(Bx)$ involves only matrix × vector (computing $y=Bx$ then $Ay$), and requiring this to equal $(AB)x$ actually uniquely defines the matrix–matrix product $AB$.
The most familar definition is that you take dot products of rows of A with columns of B to get the product $C$. For example: $$ \begin{pmatrix} -14 & 5 & 10 \\ \color{red}{-5} & -20 & 10 \\ -6 & 10 & 6 \end{pmatrix} = \begin{pmatrix} 2 & -1 & 5 \\ \color{red}{3} & \color{red}{4} & \color{red}{4} \\ -4 & -2 & 0 \end{pmatrix} \begin{pmatrix} \color{red}{1} & 0 & -2 \\ \color{red}{1} & -5 & 1 \\ \color{red}{-3} & 0 & 3 \end{pmatrix} $$ where we have highlighted the entry $\color{red}{-5 = 3 \times 1 + 4 \times 1 + 4 \times -3}$ (second row of $A$ ⋅ first column of $B$).
This can be written out as the formula $$ c_{ij} = \sum_{k=1}^n a_{ik} b_{kj} $$ in terms of the entries of the matrices, e.g. $c_{ij}$ is the entry in row $i$, column $j$ of $C$, assuming $A$ has $n$ columns and $B$ has $n$ rows.
Essentially all matrix multiplications in practice are done with a version of this formula — at least, with the same operations, but often the order in which you multiply/add individual numbers is re-arranged.
In this notebook, we will explore several ways to think about these operations by re-arranging their order.
A = [ 2 -1 5
3 4 4
-4 -2 0]
B = [ 1 0 -2
1 -5 1
-3 0 3]
C = A * B
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
Because matrix multiplication is generally not commutative, $AB$ and $BA$ give different matrices:
B*A
3×3 Array{Int64,2}: 10 3 5 -17 -23 -15 -18 -3 -15
A*B - B*A
3×3 Array{Int64,2}: -24 2 5 12 3 25 12 13 21
Though sometimes it can happen to be commutative.
A*(A^2 + 2*A + inv(A)*10)
3×3 Array{Float64,2}: -138.0 -89.0 -147.0 -101.0 -24.0 92.0 44.0 46.0 -132.0
(A^2 + 2*A + inv(A)*10) * A
3×3 Array{Float64,2}: -138.0 -89.0 -147.0 -101.0 -24.0 92.0 44.0 46.0 -132.0
If we want, we can compute the individual dot products in Julia too. For example, let's compute $c_{2,1} = -5$ (the 2nd row and first column of $C$, or C[2,1]
in Julia) by taking the dot product of the second row of $A$ with the first column of $B$.
To extract rows and columns of a matrix, Julia supports a syntax for "array slicing" pioneered by APL. The second row of $A$ is A[2,:]
, and the first column of B
is B[:,1]
:
A
3×3 Array{Int64,2}: 2 -1 5 3 4 4 -4 -2 0
A[2,3] # 2nd row, 3rd col
4
A[2,:] # 2nd row of A
3-element Array{Int64,1}: 3 4 4
B
3×3 Array{Int64,2}: 1 0 -2 1 -5 1 -3 0 3
B[:,1] # 1st column of B
3-element Array{Int64,1}: 1 1 -3
Now we can compute $c_{2,1}$ by their dot product via the dot
function:
dot(A[2,:], B[:,1])
-5
We can also write the dot product with the $x \dot y$ notation:
A[2,:] ⋅ B[:,1] # type \cdot + tab
-5
α = A[2,3] # \alpha + tab
4
This matches $c_{2,1}$ from above, or C[2,1]
in Julia:
C[2,1]
-5
A[2,:]' * B[:,1] # yet another way to get a dot product
-5
function matmul_ijk(A,B)
m,n = size(A)
n2,p = size(B)
if n≠n2 error("No good, n=$n ≠ n2=$(n2)") end
C = fill(0,m,p) # m x p "zeros" matrix
for i=1:m, j=1:p, k=1:n
C[i,j] += A[i,k]*B[k,j] # shorthand for C[i,j] = C[i,j] + A[i,k]*B[k,j]
end
return C
end
matmul_ijk (generic function with 1 method)
matmul_ijk(A,B)
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
$AB$ can be viewed as multiplying $A$ on the left by each column of $B$.
For example, let's multiply $A$ by the first column of $B$:
A*B
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
B[:,1]
3-element Array{Int64,1}: 1 1 -3
A * B[:,1]
3-element Array{Int64,1}: -14 -5 -6
using Interact
@manipulate for j=1:3
A * B[:,j]
end
This is the first column of $C$! If we do this to all the columns of $B$, we get $C$:
[ A*B[:,1] A*B[:,2] A*B[:,3] ]
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
A*B
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
Equivalently, each column of $B$ specifies a linear combination of columns of $A$ to produce the columns of $C$. So, if you want to rearrange the columns of a matrix, multiply it by another matrix on the *right*.
For example, let's do the transformation that flips the sign of the first column of $A$ and swaps the second and third columns.
A
3×3 Array{Int64,2}: 2 -1 5 3 4 4 -4 -2 0
A * [ -1 0 0
0 0 1
0 1 0 ]
3×3 Array{Int64,2}: -2 5 -1 -3 4 4 4 0 -2
As another example, let's swap the first two columns:
A * [ 0 1 0
1 0 0
0 0 1 ]
3×3 Array{Int64,2}: -1 2 5 4 3 4 -2 -4 0
A
3×3 Array{Int64,2}: 2 -1 5 3 4 4 -4 -2 0
A lot of students are perplexed. They wonder how it could be legal to reorder in this way. It might take working through a few examples by hand to realize that from the perspective of C[i,j], the same sum is accumulated in the same order, but the order in which the different elements of C finish may vary. This little Julia demo may help with this understanding.
$AB$ can be viewed as multiplying each row of $A$ by the matrix $B$ on the right. Multiplying a row vector by a matrix on the right produces another row vector.
For example, here is the first row of $A$:
A[1,:]
3-element Array{Int64,1}: 2 -1 5
Whoops, slicing a matrix in Julia produces a 1d array, which is interpreted as a column vector, no matter how you slice it. We can't multiply a column vector by a matrix $B$ on the right — that operation is not defined in linear algebra (the dimensions don't match up). Julia will give an error if we try it:
A
3×3 Array{Int64,2}: 2 -1 5 3 4 4 -4 -2 0
A[1,:] * B
DimensionMismatch("matrix A has dimensions (3,1), matrix B has dimensions (3,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] *(::Array{Int64,1}, ::Array{Int64,2}) at ./linalg/matmul.jl:87 [4] include_string(::String, ::String) at ./loading.jl:522 [5] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/stevenj/.julia/v0.6/IJulia/src/execute_request.jl:193 [6] (::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})() at /Users/stevenj/.julia/v0.6/Compat/src/Compat.jl:332 [7] eventloop(::ZMQ.Socket) at /Users/stevenj/.julia/v0.6/IJulia/src/eventloop.jl:8 [8] (::IJulia.##13#16)() at ./task.jl:335
To get a row vector we must transpose the slice A[1,:]
. In linear algebra, the transpose of a vector $x$ is usually denoted $x^T$. In Julia, the transpose is x.'
.
If we omit the .
and just write x'
it is the complex-conjugate of the transpose, sometimes called the adjoint, often denoted $x^H$ (in matrix textbooks), $x^*$ (in pure math), or $x^\dagger$ (in physics). For real-valued vectors (no complex numbers), the conjugate transpose is the same as the transpose, and correspondingly we usually just do x'
for real vectors.
A[1,:]'
1×3 RowVector{Int64,Array{Int64,1}}: 2 -1 5
Now, let's multiply this by $B$, which should give the first row of $C$:
A[1,:]' * B
1×3 RowVector{Int64,Array{Int64,1}}: -14 5 10
C
3×3 Array{Int64,2}: -14 5 10 -5 -20 10 -6 10 6
Yup!
Note that if we multiply a row vector by a matrix on the left, it doesn't really make sense. Julia will give an error:
B * A[1,:]'
DimensionMismatch("matrix A has dimensions (3,3), matrix B has dimensions (1,3)") Stacktrace: [1] _generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:492 [2] generic_matmatmul!(::Array{Int64,2}, ::Char, ::Char, ::Array{Int64,2}, ::Array{Int64,2}) at ./linalg/matmul.jl:483 [3] A_mul_Bc(::Array{Int64,2}, ::Array{Int64,1}) at ./linalg/matmul.jl:86 [4] include_string(::String, ::String) at ./loading.jl:522 [5] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/stevenj/.julia/v0.6/IJulia/src/execute_request.jl:193 [6] (::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})() at /Users/stevenj/.julia/v0.6/Compat/src/Compat.jl:332 [7] eventloop(::ZMQ.Socket) at /Users/stevenj/.julia/v0.6/IJulia/src/eventloop.jl:8 [8] (::IJulia.##13#16)() at ./task.jl:335
If we multiply $B$ on the right by all the rows of $A$, we get $C$ again:
[ A[1,:]'*B
A[2,:]'*B
A[3,:]'*B ] == C
true
Equivalently, each row of $A$ specifies a linear combination of rows of $B$ to produce the rows of $C$. So, if you want to rearrange the rows of a matrix, multiply it by another matrix on the *left*.
For example, let's do the transformation that adds two times the first row of $B$ to the third row, and leaves the other rows untouched. This is resembles one of the steps of Gaussian elimination!
B
3×3 Array{Int64,2}: 1 0 -2 1 -5 1 -3 0 3
[ 1 0 0
0 1 0
2 0 1 ] * B
3×3 Array{Int64,2}: 1 0 -2 1 -5 1 -1 0 -1
In fact, we can write exactly one of the steps for Gaussian elimination on $B$ in this form:
Let's do the transformation that adds three times the first row of $B$ to the third row, subtracts the first row from the second, and leaves the first row untouched.
[ 1 0 0
-1 1 0
3 0 1 ] * B
3×3 Array{Int64,2}: 1 0 -2 0 -5 3 0 0 -3
As desired, this "eliminates" the sub-diagonal entries of the first column of B. The nice thing about writing elimination algebraically in this way, instead of in words, is that it will allow us to think about the elimination process algebraically.
The key to this perspective is to observe:
(See this excellent paper by Gil Strang for more on this perspective applied to linear algebra. You will be in a better position to understand this at the end of 18.06, however.)
For example, here is column 1 of $A$ times row 1 of $B$:
A[:,1] * B[1,:]'
3×3 Array{Int64,2}: 2 0 -4 3 0 -6 -4 0 8
If we do this for all three rows and columns and add them up, we get $C$:
A[:,1] * B[1,:]' + A[:,2] * B[2,:]' + A[:,3] * B[3,:]' == C
true
So, from this perspective, we could write:
$$ AB = \sum_{k=1}^3 (\mbox{column } k \mbox{ of } A) (\mbox{row } k \mbox{ of } B) = \sum_{k=1}^3 A[:,k] \, B[k,:]^T $$where in the last expression we have used Julia notation for slices.
It turns out that all of the above are special cases of a more general rule, by which we can break up a matrix in to "submatrix" blocks and multiply the blocks. Rows, columns, etc. are just blocks of different shapes. We will do more of this later in the course.