This notebook is based on section 10.1 of Strang's Linear Algebra textbook.
One interesting source of large matrices in linear algebra is a [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics), a collection of nodes (vertices) and edges (arrows from one vertex to another). Graphs are used in many applications to represent relationships and connectivity, such as:
In this notebook, we explain how a graph can be represented by a matrix, and how linear algebra can tell us properties of the graph and can help us do computations on graph-based problems. There is a particularly beautiful connection to Kirchhoff's laws of circuit theory.
To run the code in this notebook, you'll need to install a few Julia packages used below. To do so, uncomment the following line and run it:
# Pkg.add.(["LightGraphs", "MetaGraphs", "GraphPlot", "NamedColors", "RowEchelon", Interact", "SymPy"])
...and then run this cell to import the packages:
using Interact, RowEchelon, LightGraphs, MetaGraphs, GraphPlot, NamedColors, LinearAlgebra
import SymPy
using SymPy: Sym
There are several Julia packages for manipulating graphs, e.g. LightGraphs, along with several packages for visualizing graphs, e.g. GraphViz. LightGraphs is oriented towards fast and sophisticated graph computations, however, and here I just want to do some simple and pretty visualizations with simple algorithms based on those in Strang's 18.06 textbook.
So, here I define a simple MyGraph
wrapper around LightGraphs directed graphs, with metadata attached via the MetaGraphs package, for basic plotting via the GraphPlots package.
struct MyGraph
g::MetaDiGraph
end
Base.copy(mg::MyGraph) = MyGraph(copy(mg.g))
function MyGraph(edges::Pair{<:Integer,<:Integer}...)
g = SimpleDiGraphFromIterator(Edge(e) for e in edges)
MyGraph(MetaDiGraph(g))
end
using Random
function deterministic_spring_layout(g::AbstractGraph; seed::Integer=0, kws...)
rng = MersenneTwister(seed)
spring_layout(g, 2 .* rand(rng, nv(g)) .- 1.0, 2 .* rand(rng,nv(g)) .- 1.0; kws...)
end
function Base.show(io::IO, m::MIME"image/svg+xml", mg::MyGraph)
show(io, m,
gplot(mg.g, layout=deterministic_spring_layout,
nodelabel=map(v -> get(MetaGraphs.props(mg.g, v), :label, v), vertices(mg.g)),
nodefillc=map(v -> get(MetaGraphs.props(mg.g, v), :color, "gray"), vertices(mg.g)),
edgelabel=map(ie -> get(MetaGraphs.props(mg.g, ie[2]), :label, ie[1]), enumerate(edges(mg.g))),
edgestrokec=map(e -> get(MetaGraphs.props(mg.g, e), :color, "lightgray"), edges(mg.g)),
))
end
function nodecolors!(g::MyGraph, nodes::AbstractVector{<:Integer}, color::String="red")
for n in nodes
set_prop!(g.g, n, :color, color)
end
g
end
nodecolors(g, nodes, color) = nodecolors!(copy(g), nodes, color)
edgearr(g, e) = e
edgearr(g, e::AbstractVector{<:Integer}) = collect(edges(g))[e]
edgearr(g, e::AbstractVector{<:Pair}) = Edge.(e)
function edgecolors!(g::MyGraph, edges::AbstractVector, color::String="red")
for e in edgearr(g.g, edges)
set_prop!(g.g, e, :color, color)
end
g
end
edgecolors(g::MyGraph, edges::AbstractVector, color::String="red") = edgecolors!(copy(g), edges, color)
# A little code so that we can label graph nodes/edges with SymPy expressions.
# convert strings like "v_2 - v_0" from SymPy to nicer Unicode strings like "v₂ - v₀"
subchar(d::Integer) = Char(UInt32('₀')+d)
subchar(c::Char) = subchar(UInt32(c)-UInt32('0'))
subchar(s::String) = replace(s, r"_[0-9]" => s -> subchar(s[2]))
labelstring(s::SymPy.Sym) = subchar(repr("text/plain", s))
labelstring(x) = x
function labels!(g::MyGraph; edges=nothing, nodes=nothing)
if edges !== nothing
for (e,E) in zip(MetaGraphs.edges(g.g), edges)
set_prop!(g.g, e, :label, labelstring(E))
end
end
if nodes !== nothing
for (n,N) in zip(vertices(g.g), nodes)
set_prop!(g.g, n, :label, labelstring(N))
end
end
g
end
labels(g::MyGraph; kws...) = labels!(copy(g); kws...)
labels (generic function with 1 method)
# generate a random graph with a given average #edges per node
function randgraph(numnodes::Integer, edgespernode::Real)
p = edgespernode/numnodes # probability of each edge
e = Vector{Pair{Int,Int}}()
for i = 1:numnodes, j = 1:numnodes
if i != j && rand() < p
push!(e, i=>j)
end
end
return MyGraph(e...)
end
randgraph (generic function with 1 method)
# returns the incidence matrix for g
function incidence(g::MyGraph)
A = zeros(Int, ne(g.g), nv(g.g))
for (i,e) in enumerate(edges(g.g))
A[i,e.src] = -1
A[i,e.dst] = +1
end
return A
end
incidence (generic function with 1 method)
# Find the loops in g by the simplest "textbook" manner:
# get a basis for the left nullspace incidence matrix.
# We do this via the rref form, rather than nullspace(A'), because
# we want a "nice" basis of ±1 and 0 entries.
function leftnullspace(g::MyGraph)
A = incidence(g)
R = rref(Matrix(A'))
m, n = size(R)
pivots = Int[]
for i = 1:m
j = findfirst(!iszero, R[i,:])
j !== nothing && push!(pivots, j)
end
r = length(pivots) # rank
free = Int[j for j=1:n if j ∉ pivots]
N = zeros(Int, n, n-r)
k = 0
for (k,j) in enumerate(free)
N[pivots, k] = -R[1:r, j]
N[j, k] = 1
end
return N
end
leftnullspace (generic function with 1 method)
# color the edges of a spanning tree of g, by the textbook
# method of finding the pivot rows of the incidence matrix
function pivotrows(g::MyGraph)
A = incidence(g)
R = rref(Matrix(A'))
m, n = size(R)
pivots = Int[]
for i = 1:m
j = findfirst(!iszero, R[i,:])
j !== nothing && push!(pivots, j)
end
return pivots
end
colortree(g::MyGraph, color::String="red") = edgecolors(g, pivotrows(g), color)
tree(g::MyGraph) = MyGraph(MetaDiGraph(SimpleDiGraphFromIterator(edgearr(g.g, pivotrows(g)))))
tree (generic function with 1 method)
Let's start by looking at an example graph with 6 nodes 8 edges. Computers are pretty good at drawing graphs for us:
g = MyGraph(1=>4, 4=>5, 5=>6, 6=>3, 3=>2, 2=>1, 2=>6, 4=>6)
A key way to represent a graph in linear algebra is the incidence matrix. As defined in Strang's textbook, this is a matrix where the rows correspond to edges and the columns correspond to nodes. (Some authors use the transpose of this instead.)
In particular, in the row for each edge going from node N to node M, there is a -1 in column N and a +1 in column N.
For example, the incidence matrix of the graph above is:
A = incidence(g)
8×6 Array{Int64,2}: -1 0 0 1 0 0 1 -1 0 0 0 0 0 -1 0 0 0 1 0 1 -1 0 0 0 0 0 0 -1 1 0 0 0 0 -1 0 1 0 0 0 0 -1 1 0 0 1 0 0 -1
There is an interesting structure if you think about loops in the graph. For example, in the graph above there is a loop among nodes 6, 3, 2, via edges 4,3,8. Let's look at the rows of $A$ corresponding to those edges:
A[[4,3,8],:]
3×6 Array{Int64,2}: 0 1 -1 0 0 0 0 -1 0 0 0 1 0 0 1 0 0 -1
If we add these rows we get zero:
A[4,:]' + A[3,:]' + A[8,:]'
1×6 Adjoint{Int64,Array{Int64,1}}: 0 0 0 0 0 0
In general, it is easy to see that any loop in the graph corresponds to dependent rows: if we sum the rows going around the loop (with a minus sign for arrows in the wrong direction), we get zero.
The reason is simple: we get a -1 in a column when we leave a node, and a +1 in the column when we enter a node. When we go around the loop, we leave and enter each node, so the sum is zero.
But dependent rows correspond to elements of the left nullspace:
[0 0 1 1 0 0 0 1] * A
1×6 Array{Int64,2}: 0 0 0 0 0 0
That means that the number of "independent" (primitive) loops in a graph is related to the rank of the incidence matrix, and the independent rows of A have no loops.
Let's look at the row-reduced echelon (rref) form of $A^T$:
Matrix{Int}(rref(Matrix(A')))
6×8 Array{Int64,2}: 1 0 0 0 0 -1 -1 0 0 1 0 0 0 -1 -1 0 0 0 1 0 0 1 1 -1 0 0 0 1 0 0 0 -1 0 0 0 0 1 0 -1 0 0 0 0 0 0 0 0 0
We can see that the rank of $A$ is 5:
rank(A)
5
This means that there are five loop-free (independent) edges, and there are three (8 - 5) primitive loops. Using the rref form of $A^T$, we can read off a basis for the left nullspace from the free columns (6,7,8):
N = leftnullspace(g)
8×3 Array{Int64,2}: 1 1 0 1 1 0 -1 -1 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1
A' * N
6×3 Array{Int64,2}: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Let's visualize these loops by plotting the edges in a different color (red) one by one, with help from the Interact package to give us an interactive widget:
colorloop(g::MyGraph, n::Vector) = edgecolors!(edgecolors(g, findall(n .> 0), "red"), findall(n .< 0), "blue")
function animloops(g::MyGraph)
L = leftnullspace(g)
@manipulate for loop in 1:size(L,2)
colorloop(g, L[:,loop])
end
end
animloops (generic function with 1 method)
animloops(g)
These three loops are not the only loops in the graph, but the other loops can be made from combinations of these loops. (Similarly, the columns of $N$ are not the whole left nullspace, they are just a basis for the n nullspace.
For example, the loop between nodes 1-4-5-6-3-2 can be made by starting with 1-4-5-6-2 and "adding" the 6-3-2 loop.
In this sense, a basis for the left nullspace of $A$ is a "basis" for the other loops in the graph: we say that they are "primitive" loops.
colorloop(g, N[:,2] + N[:,3]) # add two loops to make another loop
It is fun to do the same thing for bigger graphs, chosen at random:
gbig = randgraph(15, 2)
animloops(gbig)
Conversely, the independent rows of $A$ (corresponding to the pivot columns of the rref form of $A^T$) form a maximal set of edges with no loops. A graph with no loops is called a tree, and this particular tree is called a spanning tree because it touches all of ("spans") the nodes (assuming the graph is connected).
Let's color the spanning tree (loop-free edges) of our example graph red:
colortree(g)
We can also discard all of the edges that are not in the spanning tree, and we are left with a more boring graph of just the spanning tree:
tree(g)
We can do the same thing for our bigger random example:
colortree(gbig)
tree(gbig)
And we can make trees from even larger graphs, for fun:
tree(randgraph(50, 8))
An elegant application of the incidence matrix and its subspaces arises if we think of the graph as representing an electrical circuit:
Let's visualize this by re-labeling our graph from above. We'll use the SymPy package to allow us to do symbolic (not numeric) calculations with the incidence matrix.
labels(g, edges=[Sym("i_$i") for i = 1:size(A,1)], nodes=[Sym("v_$i") for i = 1:size(A,2)])
Let's start doing some linear algebra. What happens if we multiply our incidence matrix $A$ by a vector of voltages, one per node?
v = [Sym("v_$i") for i = 1:size(A,2)]
A
8×6 Array{Int64,2}: -1 0 0 1 0 0 1 -1 0 0 0 0 0 -1 0 0 0 1 0 1 -1 0 0 0 0 0 0 -1 1 0 0 0 0 -1 0 1 0 0 0 0 -1 1 0 0 1 0 0 -1
A * v
What we get are the voltage difference (and in particular, the voltage rise) across each edge. It is easier to see this if we use the elements of $Av$ to directly label the edges of our graph:
labels(g, edges=A*v, nodes=v)
Now, let's ask the inverse question: what voltage differences $d=Av$ can possibly arise? i.e. what $d$ are in $C(A)$?
Remember, $A$ is not full rank: its rank is 5, but there are 8 rows (8 edges). So, $C(A)$ is 5-dimensional ("missing" three dimensions). Equivalently $C(A)$ is orthogonal to the left nullspace, which has three rows. What does this mean?
Let's visualize the differences d:
d = [Sym("d_$j") for j = 1:size(A,1)]
labels(g, edges=d, nodes=v)
If $N$ is a basis for the left nullspace, we must have $N^T d = 0$, or:
N' * d
But what is this? Remember, each element of the left nullspace corresponded to a loop in the graph. Saying $N^T d = 0$, or $d \perp N(A^T)$, is equivalent to saying that the sum of the voltage rises around each loop = 0.
But this is precisely Kirchhoff's voltage law from circuit theory!
To actually solve circuit problems, we need three additional ingredients:
The voltage difference $d$ must be divided by a resistance $R$ to get the current $i$ through that edge: $i = -d/R = -Yd$ (where $Y=1/R$ is the "admittance"), by Ohm's law. Note that we need a minus sign to get the current in the direction of the arrow, since $d$ was the the voltage rise across the edge.
The sum of the currents $i$ entering each node must be zero, by Kirchhoff's current law (KCL).
To get a nontrivial solution, we need some kind of source: a battery or current source, to start currents flowing.
How do we represent each one of these steps by linear-algebra operations?
To represent Ohm's law, we need to multiply the voltage differences $d=Av$ by a diagonal matrix of admittances:
Y = diagm(0=>[Sym("Y_$i") for i = 1:size(A,1)])
Y*d
Y*A*v
Given the currents $i$, a little thought shows that the net current flowing into each node is precisely $A^T i$:
i = [Sym("i_$j") for j=1:size(A,1)]
A'*i
labels(g, edges=i, nodes=A'*i)
Why is this? The reason is that each row $A^T$ corresponds to a node, and has $\pm 1$ for each edge going into or out of the node, exactly the right sign to sum the net currents flowing in:
A'
6×8 Adjoint{Int64,Array{Int64,2}}: -1 1 0 0 0 0 0 0 0 -1 -1 1 0 0 0 0 0 0 0 -1 0 0 0 1 1 0 0 0 -1 -1 0 0 0 0 0 0 1 0 -1 0 0 0 1 0 0 1 1 -1
Putting it together, given voltages $v$, the net current flowing out of each node is
$$ A^T Y A v $$The matrix $A^T Y A$ is a very special and important kind of matrix. It is obviously symmetric, and later on in the course we will see that any matrix of this form is necessarily positive semidefinite (all pivots are ≥ 0). Many important matrices in science, engineering, statistics, and other fields take on this special form.
If we multiply $A^T Y A$ together, not all of its specialness is apparent. It is often better to leave it in "factored" form:
A' * Y * A
If we just say that the net current flowing out of each node is zero, we get the equation: $$ A^T Y A v = 0 $$ or $v \in N(A^T Y A) = N(A)$.
It is an amazing and important fact that $N(A^T Y A) = N(A)$!! (You saw a version of this in homework.) Why is this? Clearly, if $Ax = 0$ then $A^T Y Ax=0$. But what about the converse? Here is a trick: if $A^T Y Ax =0$, then $x^T A^T Y A x=0 = (Ax)^T Y (Ax)$. Let $y=Ax$. It is easy to see that $y^T Y y = \sum_i Y_i y_i^2 = 0$ only if $y=0$, since all of the admittances $Y_i$ are positive. (We will later say that $Y$ is a "positive-definite matrix".) This means that $A^T Y Ax =0$ implies that $y=Ax=0$, which implies that $x \in N(A)$.
What is $N(A)$? The rank of $A$ is 5, so $N(A)$ must be 1-dimensional. A basis for it is:
nullspace(A)
6×1 Array{Float64,2}: -0.4082482904638629 -0.408248290463863 -0.4082482904638628 -0.40824829046386313 -0.4082482904638631 -0.4082482904638629
But this is, of course, just the space of vectors where all voltages are equal. In hindsight, this should be obvious: if all the voltages are equal, then their difference are zero, and the currents are zero, and KCL is satisfied.
Of course, it is much more interesting to think about circuits when the currents are nonzero!
To do this, we must consider a source term in the equations, and in particular we could try to solve
$$ A^T Y A v = s $$for some $s\ne 0$. What does $s$ represent? It is precisely an external source of current flowing out of each node.
For this to have a solution, however, we must have $s \in C(A^T Y A) = N((A^T Y A)^T)^\perp = N(A^T Y A)^\perp = N(A)^\perp$ (since $A^T Y A$ is symmetric, the left and right nullspaces are equal). We know a basis for $N(A)$ from above, so this boils down to:
$$ \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix} s = 0 = \sum_{i=1}^6 s_i $$That is, to have a solution, all current that flows in must flow out, so that the net current flowing into the circuit is zero. This makes a lot of physical sense!
Just for fun, let's solve this circuit problem when the current is flowing into node 2 and out through node 1, with slider controls for the 8 admittances, and label the edges with the currents.
twodigits(x) = round(x, digits=2)
twodigits (generic function with 1 method)
@manipulate for Y₁=0.1:0.1:10,
Y₂=0.1:0.1:10,
Y₃=0.1:0.1:10,
Y₄=0.1:0.1:10,
Y₅=0.1:0.1:10,
Y₆=0.1:0.1:10,
Y₇=0.1:0.1:10,
Y₈=0.1:0.1:10
s = [1,-1,0,0,0,0]
Y = diagm(0=>[Y₁,Y₂,Y₃,Y₄,Y₅,Y₆,Y₇,Y₈])
nodecolors!(labels(g, edges=[subchar("i_$j = $i") for (j,i) in enumerate(twodigits.(Y*A*(pinv(A'*Y*A) * s)))]),
[1,2])
end
Notice that if we make the admittance $Y_2$ really large compared to all of the other admittances, then nearly all of the current should flow just over that one edge. Conversely, if we make $Y_2$ really small, it is almost like "cutting" that wire: almost all of the current should flow through the other edges.
Hooray, math (and physics) works!
The case of matrices arising from graphs illustrates another point that I've made many times: really large matrices are often sparse (mostly 0) in practice.
For example, imagine a circuit with a million nodes. For the most part, there will only be wires between nearby nodes. Or imagine a graph where the nodes are websites and the edges are links: there are billions of sites, but each site only links to a few other sites (a few hundred at most, usually). In such cases, the incidence matrix is mostly zero, and similarly for $A^T Y A$ etcetetera.
This is hugely important, because solving $Ax=b$ and most other matrix equations scale as $\sim n^3$ for $n \times n$ matrices. $1000 \times 1000$ matrices are easy (< 1 second), but $n=10^6$ would require supercomputers, and $n=10^9$ would be impossibly hard. What saves us is that there are much faster algorithms for sparse matrices. We won't learn much about such algorithms in 18.06, but the key point is to know that they exist.
If you encounter a large sparse matrix problem in the future, go read about sparse matrix algorithms!