Solving the JobShop scheduling problem using Artelys Kalis

Scheduling problems are concerned with determining a plan for the execution of a given set of tasks. Addressing these challenges effectively is crucial in the industry, as it allows to reduce cost without compromising quality. One classical scheduling problem is the Job-Shop, in which $n$ jobs must be completed using $m$ machines. Each job is composed of multiple tasks of varying execution time, which must be processed in a given order and on specific machines. The aim is to find a schedule that minimizes the makespan (total length of the process).

Because of its highly combinatorial nature, Constraint Programming (CP) is particularly interesting for solving this problem. In the following we describe the JobShop problem and present how to state and solve it using the CP solver Artelys Kalis.

Description of the problem


A problem instance $P = (M,T,J)$ consists of:

  • A set $M$ of machines
  • A set $T$ of tasks
  • A set $J$ of jobs.

Each job $i \in J$ is composed of a subset $T_i$ of $\tau_i$ tasks. Hence a job can be represented as a sequence $\{task(i,1), task(i,2),...,task(i,\tau_i)\}$.

Each $task(i,j)$ is associated with a machine $m_{ij} \in M$, and has a duration $d_{ij}$.


$t_{ij}$ = starting time of $task(i,j)$, $\forall i \in J, \forall j \in T_i$.


We seek a schedule that minimizes the makespan, that is the total duration between the start of the first task to be processed across all machines, and the end of the last one.

The makespan, often called $C_{max}$, can be defined as the maximum completion time (starting time plus duration), across all tasks:

$$C_{max} = \max\limits_{i,j}\ (t_{ij}+d_{ij})$$


Starting times are non negative and we set a time horizon $MaxTime$. For instance, this horizon could represent one working day.

$t_{i,j} \geq 0, \quad \forall i \in J, \forall j \in T_i $

$t_{i,j} + d_{i,j} \leq MaxTime, \quad \forall i \in J, \forall j \in T_i$

Given two consecutive tasks in the same job, the first one must be completed before the next one can start. Those are called conjunctive constraints.

$t_{i,j} + d_{i,j} \leq t_{i,j+1}, \quad \forall i \in J, \forall j \in T_i$

A machine can process only one task at a time. As a result, if $task(i,j)$ and $task(k,l)$ are associated with the same machine ($m_{ij}$ = $m_{kl}$), then one of them must be completed before the machine is available again to process the other one. Those are called disjunctive constraints:

$t_{i,j} + d_{i,j} \leq t_{k,l} \quad$ or $\quad t_{k,l} + d_{k,l} \leq t_{i,j} \quad \forall i,k \in J, \forall j \in T_i, \forall l \in T_k$ such that $m_{ij} = m_{kl}$

Implementation using the Python API

Model data

Let us start by taking a look at the data.

In [1]:
file = open('jobshop33.dat', 'r')

data =

# instance 3 jobs, 3 machines
3 3
2  4  0  1  1  2
0  2  1  2  2  5
0  2  1  4  2  2

The first line represents the number of jobs and the number of machines. Here 3 jobs are to be processed on 3 machines.

Each following line represents a job as a sequence of tasks. Each task is represented by its corresponding pair of machine and duration $(m_{ij},d_{ij})$. Here the first task of the first job must be processed on machine 2, and its completion lasts 1 unit of time. The second task must be processed on machine 0, for 3 units of time, and so on.

Then we parse this data:

In [2]:
# Parse data
lines = data.split('\n')

# Remove unnecessary comments
for line in lines:
    if line.startswith("#"):
# Parse first line with machines nb and jobs nb
firstLine = lines[0].split()
nb_jobs = int(firstLine[0])
nb_machines = int(firstLine[1])

# Parse each job
# For each job, parse each pair of numbers (m, d) where m is the machine id the task must be processed on
# and d is the duration of the task
machine_used = [[]]*nb_jobs
task_duration = [[]]*nb_jobs
for j in range(nb_jobs):
    line = lines[j+1]
    parts = line.split()
    nb_tasks = len(parts)/2
    machine_used[j] = [0]*nb_tasks
    task_duration[j] = [0]*nb_tasks
    for t in range(nb_tasks):
        machine_used[j][t] = int(parts[2*t])
        task_duration[j][t] = int(parts[2*t+1])

In each line, odd indices correspond to machines, and even indices to processing times. We store this information in nested lists machine_used and task_duration.

Here are the machines to use and task durations for each job:

In [3]:
for j in range(nb_jobs):
    print("Job " + str(j) + ":")
    print("\t m " + str(machine_used[j]))
    print("\t d " + str(task_duration[j]))
Job 0:
	 m [2, 0, 1]
	 d [4, 1, 2]
Job 1:
	 m [0, 1, 2]
	 d [2, 2, 5]
Job 2:
	 m [0, 1, 2]
	 d [2, 4, 2]

The Kalis Environment

In [4]:
import sys, os

from kalis import *

The KSession Class

In Artelys Kalis, problem statement and solving are carried out inside a KSession object. The first thing to do is to create it.

In [5]:
# Creation of the Kalis session
session = KSession()

Building the model

Our problem comprises variables, constraints, and might have solutions after search. In Artelys Kalis KProblem objects hold the modeling entities and solution objects.

In [6]:
# Creation of the problem in this session
problem = KProblem(session, "JobShop");

We initialize a KProblem object variable called problem. "JobShop" is the internal name of the problem.

Note how the first parameter set our KProblem into our KSession. Everytime we will initialize a Kalis object, the syntax will be similar.

Since scheduling problems are often addressed with Constraint Programming, Kalis implements user-friendly classes specifically designed to facilitate their statement. Tasks and resources (machines in our example), are gathered into a schedule.

In [7]:
s = KSchedule(problem, "JobShop Schedule", 0, 10000);

Here we define a KSchedule object called s into our Kproblem. The third and fourth parameters correspond to the time window involved in the first constraint. Our timeline starts at 0 and our horizon is 10000 units of time. "JobShop Schedule" is the internal name of s.

We move on and define our machines and tasks into this schedule s. We use KUnaryResource objects, which represent resources that can process at most one task at a time.

In [8]:
# Create resource list
machine = [KUnaryResource(s, "M%i" % m) for m in range(nb_machines)]

For the tasks, we use the constructor of KTask.

In [9]:
# Create task list
task =  {
    (j, t): KTask(s, "J%iT%i" % (j,t), 0, s.getTimeMax(), task_duration[j][t], task_duration[j][t])
    for j in range(nb_jobs)
        for t in range(len(task_duration[j]))

Our tasks are set to last between 0 unit of time and s.getTimeMax(), the horizon of our schedule, which we set at 10000 above. Last parameters state the minimum and maximum durations of our tasks. In our example each task has a specific, inflexible duration. Hence both parameters are set at task_duration[j][t].

Then we use the requires() method of the class KTask in order to associate each task to its corresponding machine.

In [10]:
# Post resource usage for each task
for j in range(nb_jobs):
    for t in range(len(machine_used[j])):
        task[j, t].requires(machine[machine_used[j][t]],1)

This method takes two input parameters : the resource to associate, and the amount of resource capacity our task requires. For the Job-Shop problem, all machines are unary resources, e.g. have a capacity of 1, and each task requires this unary capacity.

Finally we specify that tasks are ordered for a given job using the startsAfter() method.

In [11]:
# Post precedence constraints between the tasks of a same job
for j  in range(nb_jobs):    
    for t in range(1, len(task_duration[j])):
        task[j, t].startsAfter(task[j, t-1])

And that's it! We do not need to explictly state disjunctive constraints because we implemented resources as KUnaryResources and, consequently, our machines will naturally process only one task at a time (Kalis implements these constraints internally). Also note that the variables are implicitly declared in the KSchedule, which will assign a starting time to each KTask.

Solving the problem

Once the problem has been fully built, we call the optimize() method for Kalis to solve the problem. For a scheduling problem, this method minimizes the makespan by default.

In [12]:

We might also want to optimize another objective than the makespan. If so, before calling optimize(), we would use the setObjective() method to define an objective function. Then, the setSense() method to specify the sense of optimization. If we wish to minimize : setSense(KProblem::Minimize);.

As mentioned earlier, solutions are comprised in ourKProblem, which holds the modeling entities. We combine the getProblem() and getSolution() method on our KSchedule object to obtain the solution.

In [13]:
sol = s.getProblem().getSolution()

sol is a KSolution object. It contains the value of each decision variable and information about the resolution such as computation time or number of nodes explored in the search-tree.

The code below displays the solution:

In [14]:
# Get schedule per machine
machine_sol = [[]]*nb_machines
for m in range(nb_machines):
    machine_sol[m] = []
    for j in range(nb_jobs):
        for t in range(len(task_duration[j])):
            if machine_used[j][t] == m:
                machine_sol[m].append(('Task(' + str(j) + ',' + str(t) + ')',[sol.getValue(task[j,t].getStartDateVar()),

print "Solution:"
for m in range(nb_machines):
    machine_sol[m].sort(key=lambda x: x[1])
    print "Machine " + str(m) + ": " + str(machine_sol[m])  
print "\nMakespan: " + str(sol.getValue(s.getMakeSpan())) + "\n"
Machine 0: [('Task(1,0)', [0, 2]), ('Task(2,0)', [2, 4]), ('Task(0,1)', [4, 5])]
Machine 1: [('Task(1,1)', [2, 4]), ('Task(2,1)', [4, 8]), ('Task(0,2)', [8, 10])]
Machine 2: [('Task(0,0)', [0, 4]), ('Task(1,2)', [4, 9]), ('Task(2,2)', [9, 11])]

Makespan: 11

We get the schedule of each machine: sequence of tasks and time intervals.

Finally, here is the timeline for the optimal schedule, considering that one unit of time corresponds to one hour:

In [15]:
import datetime as dt
import plotly.offline as offline
import plotly.figure_factory as ff

start_date = dt.datetime(2018, 1, 1, 8, 0, 0)

df = []
for j in range(nb_jobs):
    for t in range(len(task_duration[j])):
        df.append(dict(Task='Machine '+str(machine_used[j][t]), 
        Start= start_date+dt.timedelta(hours=sol.getValue(task[j,t].getStartDateVar())), 
        Finish= start_date+dt.timedelta(hours=sol.getValue(task[j,t].getEndDateVar())), 
        Resource='Job '+str(j)))

# Display the Gantt chart
figure = ff.create_gantt(df, title = 'Optimal Schedule', index_col='Resource', show_colorbar=True, group_tasks=True)
offline.iplot(figure, filename='gantt-jobshop')