In [1]:

```
# %load ../../preconfig.py
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['axes.grid'] = False
#import numpy as np
#import pandas as pd
#import itertools
import logging
logger = logging.getLogger()
```

Grouping $n$ distince elements into a collection of disjoint sets.

Often need to perform two operations in particular:

finding the unique set that contains a given element.

uniting two sets.

A **disjoint-set data structure** maintains a collection $\delta = {S_1, S_2, \dotsc, \S_k}$ of disjoint dynamic sets.

We identify each set by a **representative**, which is some member of the set.

Letting $x$ denote an object, we wish to support the following operations:

MAKE-SET(x): $S_x = {x} : S_x \in \delta$

UNION(x,y): $S_x \cup S_y : x \in S_x, y \in S_y$

FIND-SET(x): $S_x : x \in S_x$

analyze the running times in terms of two parameters:

$n$, the number of MAKE-SET operations,

$m$, the total number of all kinds of operations.

The number of UNION operations must be less than MAKE-SET's, namely, $< n$. And we also have $m \geq n$.

An application: determining the connected components of an undirected graph.

In [3]:

```
plt.imshow(plt.imread('./res/find_connected.png'))
```

Out[3]:

In [5]:

```
plt.figure(figsize=(12,8))
plt.imshow(plt.imread('./res/fig21_1.png'))
```

Out[5]:

In [6]:

```
# Exercises
```

Each set:

head.

tail.

Each set member:

a pointer to the next object in the list.

a pointer back to the set object.

In [7]:

```
plt.figure(figsize=(15,8))
plt.imshow(plt.imread('./res/fig21_2.png'))
```

Out[7]:

MAKE-SET and FIND-SET requires $O(1)$ time.

note: FIND-SET(x), where $x$ is the pointer to the object expected.

UNION:

randomly merge: append one to another list, and update the related pointer. $O(n)$.

- construct a sequence of $m$ operations on $n$ objects that requires $O(n^2)$ time.

*weighted-union heuristic*: append the shortest list onto the longer. $\Omega(n)$each list need to include the length of the list.

construct a sequence of $m$ operations on $n$ objects that requires $O(m + n \lg n)$ time.

In [8]:

```
# Exercises
```

Disjoint-set forests: We represent sets by rooted trees, with each node containing one member and each tree representing one set.

For each node, we maintain a **rank**, which is an upper bound on the height of the node.

UNION operation: We make the root with smaller rank point to the root with larger rank.

In [2]:

```
plt.imshow(plt.imread('./res/fig21_4.png'))
```

Out[2]:

FIND-SET: We make each node on the find path point directl to the root.

In [3]:

```
plt.imshow(plt.imread('./res/fig21_5.png'))
```

Out[3]:

In [4]:

```
# Pseudocode for disjoint-set forests
plt.imshow(plt.imread('./res/forest.png'))
```

Out[4]:

When we use both union by rank and path compression, the worst-case running time is $O(m \alpha(n))$, where $\alpha(n)$ is a very slowly growing function, which we define in Section 21.4.

In [5]:

```
# Exercises
```

略