# %load ../../preconfig.py
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['axes.grid'] = False
from __future__ import division
#import numpy as np
#import pandas as pd
#import itertools
import logging
logger = logging.getLogger()
B-Trees are balanced search trees, similar to red-black trees, but they are better at minizing disk I/O operations.
We use the number of pages read or written as a approximation of the total time spent accessing the disk.
A B-tree node is usually as large as a whole disk page, and this size limits the number of children a B-tree node can have.
plt.imshow(plt.imread('./res/fig18_3.png'))
<matplotlib.image.AxesImage at 0x109444c50>
node attributes:
Each internal node has $x.n+1$ points $x.c_1, \dotsc, x.c_{x.n+1}$ to its children.
$x.c_i.keys \leq x.key_i \leq x.c_{i+1}.keys$
All leaves have the same depth $h$.
minimum degree of the B-tree $T$: a fixed integer $t \geq 2$.
The simplest B-tree occurs when $t=2$: 2-3-4 tree.
If $n \geq 1$, then for any $n$-key B-tree $T$ of height $h$ and minimum degree $t \geq 2$.
$$h \leq \log_t \frac{n+1}{2}$$
Proof: The root of a B-tree $T$ contains at least one key, and all other nodes contain at least $t-1$ keys.
plt.figure(figsize=(15,10))
plt.imshow(plt.imread('./res/fig18_4.png'))
<matplotlib.image.AxesImage at 0x10a750fd0>
the number $n$ of keys statisfies the inequality: $$ n \geq 1 + (t-1) \sum_{i=1}^h 2t^{i-1} = 2t^h -1$$
B-trees save a factor of about $\lg t$ over red-black trees in the number of nodes examined for most tree operations.
%maybe: exercise
two conventions:
The root of the B-tree is always in main memory.
Any nodes that are passed as parameters must already have had a Disk-Read operation performed on them.
We make a multiway branching decision according to the number of the node's children. $O(t lg_t n) = O(t) \times O(h)$.
#todo: code
$O(1)$ time to allcate one disk page.
#todo: code
As we trave down the tree, we split each full node we meet, in order to assure that the parent is always not full whenever we want to split a full node.
(a) B-TREE-SPLIT-CHILD(X,I)
plt.imshow(plt.imread('./res/fig18_5.png'))
<matplotlib.image.AxesImage at 0x10bfee110>
If $T$ is full tree, then create a new root node, and split them. (Height increase 1).
(b)
B-TREE-INSERT(T,K)
if T is FULL:
S = B-TREE-CREATE()
S.children = T
R = B-TREE-SPLIT-CHILD(S)
else
R = T
B-TREE-INSERT-NONFULL(R,K)
plt.imshow(plt.imread('./res/fig18_6.png'))
<matplotlib.image.AxesImage at 0x10c1cfe90>
(c)
B-TREE-INSERT-NONFULL(T,K)
if T is LEAF
FIND_AND_INSERT(T,K)
else
C = FIND_CHILD(T,K)
if C is FULL
N = B-TREE-SPLIT-CHILD(C)
R = FIND_CHILD(N,K)
else
R = T
B-TREE-INSERT-NONFULL(R, K)
#todo: exercises
When going down, if the node's child has only $t-1$ keys, move its key down to the child.
If the node to delete is a leaf node, then directly remove it.
If the node to delete is a internal node, then
Base on the three priciples, we summary all possible cases as follows:
plt.figure(figsize=(10,20))
plt.subplot(2,1,1)
plt.imshow(plt.imread('./res/fig18_8.png'))
plt.subplot(2,1,2)
plt.imshow(plt.imread('./res/fig18_9.png'))
<matplotlib.image.AxesImage at 0x10ce6e610>
#todo: exercise