Max Kapur | maxkapur.com | Apr. 13, 2022
The school location problem is a kind of facility location problem. The city needs to build $n$ schools to serve $m$ families. Each family's location (in Euclidean space) is given by $p_j$, and their children will attend the school closest to home, so the distance they travel to school is $$\min_{i = 1 \dots n}\left\{\lVert x_i - p_j \rVert_q \right\}.$$ Here $x_i$ is the location of school $i$ and $\lVert \cdot \rVert_q$ is some $p$-norm. In this demonstration, we will use the 2-norm or Euclidean distance, but the 1-norm or taxicab distance may be more appropriate in an urban context.
Our goal is to choose the $x_i$ that minimize, in some sense, the distance between families and schools. Since we already assume that families attend the nearest school, this means we are dealing with some kind of "minimin" problem. "Minimin" and "maximax" problems tend to be more computationally challenging than "minimax" or "maximin" problems because of the convexity/concavity of the max/min operators. Often, the most effective way to formulate a minimin problem is by adding binary variables that identify the minimum.
In the school location problem, we will define the social cost as the longest distance that any student travels to school. The reason is that in a public planning problem such as this, it is important to prevent jealousy among the families. By minimizing the longest distance, we ensure that no single family has a longer commute than everyone else. (To see why this is true, suppose that some plan does give a single family the longest commute. Then their commute defines the social cost for this plan, and by nudging their school infinitesimally closer to them, we can decrease the social cost; therefore this plan is not optimal.) The plan that minimizes the average distance, for example, typically does not have this property.
Incorporating the social cost function above, we can now write the school location problem. It has a delightful "minimaximin" form: $$\text{minimize}\quad \max_{j = 1 \dots m} \Bigl\{ \min_{i = 1 \dots n}\left\{\lVert x_i - p_j \rVert_2 \right\} \Bigr\}$$ To get this into a form that our solver can handle, we first let the variable $t$ stand in for the maximal distance: $$\begin{align} \text{minimize}\quad & t \\ \text{subject to}\quad &t \geq\min_{i = 1 \dots n}\left\{\lVert x_i - p_j \rVert_2 \right\}, &j = 1 \dots m \end{align}$$ Since $t$ is greater than or equal to all of the inputs to the maximum, it is greater than or equal to the value of the maximum itself. And since this is a minimization problem, $t$ will be decreased "automatically" until it equals the value of the value of the maximum.
On the other hand, because this is not a maximization problem, the same trick doesn't work for getting rid of the minimum. Instead, let's introduce a binary variable $s_{ij}$ that equals $1$ if family $j$'s children attend school $i$, and zero otherwise. Now we can write the problem as follows. $$\begin{align} \text{minimize}\quad & t \\ \text{subject to}\quad &t \geq \lVert x_i - p_j \rVert_2 - M (1 - s_{ij}), & i = 1\dots n,~j = 1 \dots m \\ & \sum_{i=1}^n s_{ij} \geq 1, & j = 1\dots m \\ & s_{ij} \in \{0, 1\}, & i = 1\dots n,~j = 1 \dots m \end{align}$$ Here $M$ is a large, positive constant. If $s_{ij} = 0$, then the first constraint is vacuous; if $s_{ij} = 1$, the first constraint activates as required. The second constraint simply says that everyone must attend at least one school. By the minimization argument, it is easy to see that this constraint will hold with equality "automatically."
We will use the Splitting Conic Solver (SCS) to solve the relaxations of this problem. The first constraint in the problem above is already convex, but the solver will process it more efficiently if we input it in the following conic form: $$\begin{align} \text{minimize}\quad & t \\ \text{subject to}\quad & \begin{bmatrix}d_{ij} \\ x_i - p_j \end{bmatrix} \in C_2, & i = 1\dots n,~j = 1 \dots m \\ & t \geq d_{ij} - M (1 - s_{ij}), & i = 1\dots n,~j = 1 \dots m \\ & \sum_{i=1}^n s_{ij} \geq 1, & j = 1\dots m \\ & s_{ij} \in \{0, 1\}, & i = 1\dots n,~j = 1 \dots m \end{align}$$ Here $C_2 = \bigl\{(h, y) \in \mathbb{R}^3 : t \geq \lVert y \rVert_2 \bigr\}$ is known as the second-order cone.
We will use Julia and the JuMP modeling language to express this problem as a second-order conic program (SOCP). We will use SCS to solve the relaxations, and the Juniper package to handle the binary variables.
using Plots
using JuMP
using Juniper, SCS
using LinearAlgebra
using Test
Let's generate a fake instance of the problem with $m = 20$ families and $n = 3$ schools. The plot shows the location of each household.
m = 20 # number of families
n = 3 # number of schools
ndims = 2 # number of Euclidean dimensions in which points are embedded
# Generate points approximately on the unit circle
ps = map(1:m) do _
p = randn(ndims)
normalize!(p)
p += 0.1*randn(ndims)
return p
end
pl = plot(size=(600,600), xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
scatter!(pl, [tuple(p...) for p in ps], m=:square, c=:black, msc=:auto, label="families")
Now we can start building the model in JuMP.
To ensure that the model is numerically stable, it is important to choose a value of $M$ that is as small as possible without breaking the logic of the constraints. In our case, it is sufficient to have $M \geq \lVert x_i - p_j \rVert_2$ for all $i$ and $j$. It is obvious that, in an optimal plan, the distance between any school and any household will not exceed the largest distance between any two households. So, we can set $$M = \max_{i \neq k} \bigl\{\lVert p_i - p_k \rVert_2\bigr\}.$$ In the code below, I use the 1-norm here to make it easy for the reader to try using the 1-norm as the distance criterion. This is a conservative choice since the 1-norm is guaranteed to be larger than the 2-norm.
# Set SCS as the solver for the relaxation problems.
nl_solver = optimizer_with_attributes(SCS.Optimizer, "verbose"=>0)
# Wrap SCS in Juniper to enable integrality constraints.
mdl = Model(
optimizer_with_attributes(
Juniper.Optimizer,
"nl_solver"=>nl_solver,
"atol"=>1e-3,
)
)
set_silent(mdl)
# Set the value of big M, which needs to be an upper bound on d[i, j].
M = maximum(norm(ps[i] - ps[k], 1) for i in 1:m for k in i+1:m)
2.960786190030774
Now we declare our variable constraints. To ensure numerical stability, I have introduced the variable $d_{ij}$ to stand in for $\lVert x_i - p_j \rVert_2$. This will also be helpful for verifying the correctness of our solution later on.
Notice how easy it is to enter the conic constraint using JuMP's MOI.SecondOrderCone()
notation.
@variable(mdl, t)
@variable(mdl, d[1:n, 1:m])
@variable(mdl, s[1:n, 1:m], Bin)
@variable(mdl, x[1:ndims, 1:n])
for j in 1:m
for i in 1:n
@constraint(mdl, [d[i, j]; x[:, i] .- ps[j]] in MOI.SecondOrderCone(ndims+1))
# To use one norm (taxicab distance) instead
# @constraint(mdl, [d[i, j]; x[:, i] .- ps[j]] in MOI.NormOneCone(ndims+1))
# s[i, j] = 1 ⟹ d[i, j] ≤ t
@constraint(mdl, t ≥ d[i, j] - M * (1 - s[i, j]))
end
@constraint(mdl, sum(s[i, j] for i in 1:n) ≥ 1)
end
@objective(mdl, Min, t)
mdl
A JuMP Model Minimization problem with: Variables: 127 Objective function type: VariableRef `AffExpr`-in-`MathOptInterface.GreaterThan{Float64}`: 80 constraints `Vector{AffExpr}`-in-`MathOptInterface.SecondOrderCone`: 60 constraints `VariableRef`-in-`MathOptInterface.ZeroOne`: 60 constraints Model mode: AUTOMATIC CachingOptimizer state: EMPTY_OPTIMIZER Solver name: Juniper Names registered in the model: d, s, t, x
Here is where we actually solve the model. Depending on your computer, this cell may take several minutes to run. You can reduce the number or families and/or schools if you are impatient.
optimize!(mdl)
solution_summary(mdl)
* Solver : Juniper * Status Termination status : LOCALLY_SOLVED Primal status : FEASIBLE_POINT Dual status : FEASIBLE_POINT Message from the solver: "LOCALLY_SOLVED" * Candidate solution Objective value : 0.6528703415101038 Objective bound : 0.6528703415101038 * Work counters Solve time (sec) : 48.10764
Read out the solution and verify its correctness. Each entry of xs
gives the coordinates of one school, and we store the index of the $j$th family's nearest school as attend_school[j]
. We verify that this is correct in two ways:
d[i, j]
-values and ensure that each attend_school[j]
is the minimum of the corresponding column. (Note that the d[i, j]
values only give the correct Euclidean distance to the closest school. If s[i, j] == 0
, it is not necessarily true that d[i, j] == norm(xs[i] - ps[j])
, because the corresponding big-$M$ constraint is not necessarily binding.)1
in each column of s
occurs at the attend_school[j]
th position.xs = [value.(x[:, i]) for i in 1:n]
attend_school = map(ps) do p
argmin(i -> norm(p - xs[i]), 1:n)
end
@test attend_school == map(argmin, eachcol(value.(d))) == map(argmax, eachcol(value.(s)))
Test Passed
Expression: attend_school == map(argmin, eachcol(value.(d))) == map(argmax, eachcol(value.(s)))
Evaluated: [2, 3, 1, 2, 1, 1, 2, 3, 1, 3, 2, 1, 1, 3, 1, 1, 1, 2, 2, 3] == [2, 3, 1, 2, 1, 1, 2, 3, 1, 3, 2, 1, 1, 3, 1, 1, 1, 2, 2, 3] == [2, 3, 1, 2, 1, 1, 2, 3, 1, 3, 2, 1, 1, 3, 1, 1, 1, 2, 2, 3]
Finally, we make a plot showing the school locations and each family's local school.
pl = plot(size=(600,600), xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
for (j, p) in enumerate(ps)
plot!(
pl,
[tuple(p...), tuple(xs[attend_school[j]]...)],
label= j==1 ? "local school" : nothing,
c = "navy",
ls = :dash
)
end
scatter!(pl, [tuple(p...) for p in ps], m=:square, c=:black, msc=:auto, label="families")
scatter!(pl, [tuple(x...) for x in xs], c=:crimson, msc=:auto, ms=7, label="schools")
pl
The output above suggests that this model finds intuitive "neighborhoods" in the input data and produces a school for each. In this regard, the problem is very similar to $k$-means clustering. However, the $k$-means clustering solution minimizes the sum (equivalently, average) Euclidean distance between families and schools, and therefore differs from our model. For example, if there is an outlier student who lives very far from the others, our model will place a school halfway between her family and the population center, whereas the $k$-means clustering solution will not be as drastically swayed. This means that the school location problem is not very robust to outliers. Such is the cost of fairness.
Because of its integrality constraints, the school location problem is very difficult to solve. This is another commonality with $k$-means clustering, which is known to be NP-hard.
Let's try using a heuristic algorithm for the $k$-means clustering problem (which, confusingly, is often called the "$k$-means algorithm") to produce an alternative solution and see how it differs from that above.
using Clustering
res = kmeans(hcat(ps...), n)
xs_kmeans = map(Vector{Float64}, eachcol(res.centers))
attend_school_kmeans = attend_school = map(ps) do p
argmin(i -> norm(p - xs_kmeans[i]), 1:n)
end
pl = plot(size=(600,600), xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
for (j, p) in enumerate(ps)
plot!(
pl,
[tuple(p...), tuple(xs_kmeans[attend_school_kmeans[j]]...)],
label= j==1 ? "local school" : nothing,
c = "navy",
ls = :dash
)
end
scatter!(pl, [tuple(p...) for p in ps], m=:square, c=:black, msc=:auto, label="families")
scatter!(pl, [tuple(x...) for x in xs_kmeans], c=:crimson, msc=:auto, ms=7, label="schools (k means)")
pl
As expected, this solution results in some students who have long commutes in service of the greater good.