Chapter 2 of Real World Algorithms.

Panos Louridas

Athens University of Economics and Business

The most common way to represent graphs in Python is with adjacency lists.

The adjacency lists are put into a Python dictionary.

The keys of the dictionary are the nodes, and the value for each node is its adjacency list.

With a slight abuse of terminology, we could use other data structures instead of a list to represent an adjacency list: for example, a set is a sensible choice, as we don't care about the order of the items in the list, and checking for membership (i.e., checking if a node is a neighbor of another node) is much faster in a set than in a list. In a well-implemented set it takes constant time, while in a list the time is linear and depends on the length of the list.

In [1]:

```
g = {
0: [1, 2, 3],
1: [0, 4],
2: [0],
3: [0, 5],
4: [1, 5],
5: [3, 4, 6, 7],
6: [5],
7: [5],
}
# print whole graph
print(g)
# print adjacency list of node 0
print(g[0])
# print adjacency list of node 5
print(g[5])
```

Similarly, here is the same graph, but this time the nodes are strings (single-character strings, which are still strings in Python).

Nodes can be anything: numbers, strings, or anything else that can be used as a key in a Python dictionary, i.e., everything that is "hashable".

In [2]:

```
g = {
'a': ['b', 'c', 'd'],
'b': ['a', 'e'],
'c': ['a'],
'd': ['a', 'f'],
'e': ['b', 'f'],
'f': ['d', 'e', 'g', 'h'],
'g': ['f'],
'h': ['f'],
}
# print whole graph
print(g)
# print adjacency list of node 'a'
print(g['a'])
# print adjacency list of node 'e'
print(g['e'])
```

Suppose we have the following graph and we want to explore it depth-first.

In depth-first search, we follow a path as far as we can; when we reach a dead-end, that is, a node with no-unvisited neighbours, we backgrack to the previous unvisited node.

The graph is represented in Python as follows:

In [3]:

```
g = {
0: [1, 2, 3],
1: [0, 4],
2: [0, 4],
3: [0, 5],
4: [5],
5: [4, 6, 7],
6: [],
7: []
}
```

The depth-first recursive search algorithm is then simply:

In [4]:

```
from typing import Hashable # for use with the type annotation below
visited = [ False ] * len(g)
def dfs(g: dict, node: Hashable) -> None:
print("Visiting", node)
visited[node] = True
for v in g[node]:
if not visited[v]:
dfs(g, v)
dfs(g, 0)
```

Visiting 0 Visiting 1 Visiting 4 Visiting 5 Visiting 6 Visiting 7 Visiting 2 Visiting 3

It is possible to implement depth-first search without recursion.

To do that, we have to emulate recursion ourselves, by using a stack.

In [5]:

```
def dfs_stack(g: dict, node: Hashable) -> list[bool]:
s = []
visited = [ False ] * len(g)
s.append(node)
while len(s) != 0:
print("Stack", s)
c = s.pop()
print("Visiting", c)
visited[c] = True
for v in g[c]:
if not visited[v]:
s.append(v)
return visited
dfs_stack(g, 0)
```

Stack [0] Visiting 0 Stack [1, 2, 3] Visiting 3 Stack [1, 2, 5] Visiting 5 Stack [1, 2, 4, 6, 7] Visiting 7 Stack [1, 2, 4, 6] Visiting 6 Stack [1, 2, 4] Visiting 4 Stack [1, 2] Visiting 2 Stack [1] Visiting 1

Out[5]:

[True, True, True, True, True, True, True, True]

The stack-based depth-first search may insert a node in the stack multiple times.

For example, consider the following graph:

The graph is represented as follows:

In [6]:

```
g2 = {
0: [1, 2, 3],
1: [0, 4],
2: [0],
3: [0, 5],
4: [1, 5],
5: [3, 4, 6, 7],
6: [5],
7: [5]
}
```

Then we can traverse with with the stack-based version of depth-first search:

In [7]:

```
dfs_stack(g2, 0)
```

Stack [0] Visiting 0 Stack [1, 2, 3] Visiting 3 Stack [1, 2, 5] Visiting 5 Stack [1, 2, 4, 6, 7] Visiting 7 Stack [1, 2, 4, 6] Visiting 6 Stack [1, 2, 4] Visiting 4 Stack [1, 2, 1] Visiting 1 Stack [1, 2] Visiting 2 Stack [1] Visiting 1

Out[7]:

[True, True, True, True, True, True, True, True]

You may notice that node 1 enters the stack twice.

That does not affect the correctness of the algorithm, as the algorithm will explore the whole graph, but we can fix it anyway.

One way to fix it would be to search in the stack and if the node is already there, we would not put it.

However, searching in a list takes place in linear time, depending on the length of the list.

It is faster to keep a separate structure in which we record if something is in the stack.

That requires more space: an instance of speed-space trade-off.

In [8]:

```
def dfs_nd_stack(g: dict, node: Hashable) -> list[bool]:
s = []
visited = [ False ] * len(g)
instack = [ False ] * len(g)
s.append(node)
instack[node] = True
while len(s) != 0:
print("Stack", s)
c = s.pop()
instack[c] = False
print("Visiting", c)
visited[c] = True
for v in g[c]:
if not visited[v] and not instack[v]:
s.append(v)
instack[v] = True
return visited
dfs_nd_stack(g2, 0)
```

Out[8]:

[True, True, True, True, True, True, True, True]

In breadth-first search we visit all neighbours of a node, then all the neighbours of the neighbours, and so on.

The exploration is like a ripple spreading outwards.

We can implement breadth-first search using a First-In First-Out (FIFO) queue; in Python this is provided by `collections.deque`

.

In [9]:

```
from collections import deque
g = {
0: [1, 2, 3],
1: [0, 4],
2: [0, 4],
3: [0, 5],
4: [5],
5: [4, 6, 7],
6: [],
7: []
}
def bfs(g: dict, node: Hashable) -> list[bool]:
q = deque()
visited = [ False ] * len(g)
inqueue = [ False ] * len(g)
q.appendleft(node)
inqueue[node] = True
while not (len(q) == 0):
print("Queue", q)
c = q.pop()
print("Visiting", c)
inqueue[c] = False
visited[c] = True
for v in g[c]:
if not visited[v] and not inqueue[v]:
q.appendleft(v)
inqueue[v] = True
return visited
bfs(g, 0)
```

Queue deque([0]) Visiting 0 Queue deque([3, 2, 1]) Visiting 1 Queue deque([4, 3, 2]) Visiting 2 Queue deque([4, 3]) Visiting 3 Queue deque([5, 4]) Visiting 4 Queue deque([5]) Visiting 5 Queue deque([7, 6]) Visiting 6 Queue deque([7]) Visiting 7

Out[9]:

[True, True, True, True, True, True, True, True]

Usually we read graphs from files, typically text files.

A common way to store graphs is in text files where each line contains a link between two nodes.

For example, the file containing the first graph we saw would be:

```
0 1
0 2
0 3
1 0
1 4
2 0
2 4
3 0
3 5
4 5
5 4
5 6
5 7
```

To read this file we would go line-by-line.

We would split each line on whitespace.

We would then get the two parts and treat them as nodes.

Note that we assume that the nodes are integers, so we convert the split pieces with `int(x)`

. If nodes were strings, this would not be required.

We also assume that the graph is directed.

The following example will read file example_graph_1.txt, the directed graph we used for the depth-first example, which has the following contents:

```
0 1
0 2
0 3
1 0
1 4
2 0
2 4
3 0
3 5
4 5
5 4
5 6
5 7
```

In [10]:

```
input_filename = "example_graph_1.txt"
g = {}
with open(input_filename) as graph_input:
for line in graph_input:
# Split line and convert line parts to integers.
nodes = [int(x) for x in line.split()]
if len(nodes) != 2:
continue
# If a node is not already in the graph
# we must create a new empty list.
if nodes[0] not in g:
g[nodes[0]] = []
if nodes[1] not in g:
g[nodes[1]] = []
# We need to append the "to" node
# to the existing list for the "from" node.
g[nodes[0]].append(nodes[1])
print(g)
```

{0: [1, 2, 3], 1: [0, 4], 2: [0, 4], 3: [0, 5], 4: [5], 5: [4, 6, 7], 6: [], 7: []}

Printing a graph like that is not very convenient.

Python offers the `pprint`

(pretty-print) library that can help us output stuff in a more meaningful manner.

In [11]:

```
import pprint
pprint.pprint(g)
```

{0: [1, 2, 3], 1: [0, 4], 2: [0, 4], 3: [0, 5], 4: [5], 5: [4, 6, 7], 6: [], 7: []}

For undirected graphs, the code is pretty much the same; we only need to take care to enter the edge $(v, u)$ for every edge $(u, v)$ that we meet in the file.

Here is the equivalent for the file example_graph_2.txt, which is the undirected graph we used for depth-first search.

In [12]:

```
input_filename = "example_graph_2.txt"
g = {}
with open(input_filename) as graph_input:
for line in graph_input:
# Split line and convert line parts to integers.
nodes = [int(x) for x in line.split()]
if len(nodes) != 2:
continue
# If a node is not already in the graph
# we must create a new empty list.
if nodes[0] not in g:
g[nodes[0]] = []
if nodes[1] not in g:
g[nodes[1]] = []
# We need to append the "to" node
# to the existing list for the "from" node.
g[nodes[0]].append(nodes[1])
# And also the other way round.
g[nodes[1]].append(nodes[0])
pprint.pprint(g)
```

{0: [1, 2, 3], 1: [0, 4], 2: [0], 3: [0, 5], 4: [1, 5], 5: [3, 4, 6, 7], 6: [5], 7: [5]}