Notebook

Entropy, Compression, and Huffman Variable Length Encoding¶

stough 202-

Computing Entropy
Huffman Trees
Variable-length encoding and decoding

Images and videos can be quite large. At 60 frames per second, a 4K Ultra HD movie of 2 hours could require

\begin{equation*} ((3840\cdot2160)pixels \cdot 3\frac{byte}{pixel} = 24883200 \frac{byte}{frame}\\ \frac{24883200 \frac{byte}{frame}\cdot60\frac{frame}{second}\cdot60\frac{second}{minute}\cdot120 minutes}{10^9\frac{byte}{GB}} = 10749.5 GB, \end{equation*}

or about 163 blu-ray discs to store without compression. So clearly compression is a thing.

Compression takes advantage of different kinds of redundancy in the signal. Images in particular have a lot of redundancy. Spatial and Temporal redundancy are characterized by the fact that most pixels are closely related their neighbors. This is due to the fact that any images of human interest are likely to have objects comprised of regions of similar color. Temporal redundancy is the video case, where a pixel over time changes little usually.

Coding redundancy is the one we're dealing with here. Most images we consider are 3-channel images with 8 bits per pixel per channel. We need 8 bits to represent a value in $[0, 255]$. Variable-length encoding can take advantage of the differential frequency of certain pixel values over others, encoding more common pixels with fewer bits, and paying for that by using more bits to represent less common pixels.

Compressibility: The degree to which the information in an image or video can be represented using fewer bits. In the example above of the highly compressible 4K Ultra HD, we might say the compression ratio is ~160 (display size over storage size).
Entropy: a measure of the information content in a signal.
- signal: here, think of this as seeing each individual pixel in the image in order.
- Think of entropy as the surprise, or "unpredictability" of the signal. Each new pixel value $k$ in the signal gives you some information,
\begin{equation*} I(k) = -\log_2(p(k)), \end{equation*} where $p(k)$ is the probability of observing $k$ in the signal. For example, if the probability of observing $k$ is 1, as in 100% of all the pixels have the value $k$, then there is no information/surpise, or 0 bits, associated with observing it. If $p(k) = \frac{1}{2}$, then that's 1 bit of information. (We won't consider the symbols $\kappa$ that never show up, since observing such a symbol would be infinitely surprising :-).
- For our images in the range $[0, 255]$, we can measure the entropy or information content of the image as
\begin{equation*} \tilde{H} = -\sum_{k=0}^{255} p(k)\log_{2}p(k) \end{equation*} This is just an average of how many bits of information each newly observed pixel gives us, weighted by how likely that pixel value is. $p(k)$ is just the normalized histogram values.

In this notebook we'll explore Huffman variable-length encoding and its effectiveness in some images.

Imports¶

Notice I've added our own custom Huffman utilities huff_utils and scipy.stats.entropy. Study build_huff_tree in particular.

In [ ]:

%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as mcolors
import skimage.color as color
from ipywidgets import VBox, HBox, FloatLogSlider
from scipy.stats import entropy

# For importing from alternative directory sources
import sys  
sys.path.insert(0, '../dip_utils')


from huff_utils import (build_huff_tree,
                        build_huff_encoder,
                        build_huff_pair,
                        load_huffable_image,
                        test_tree_making)
from matrix_utils import (arr_info,
                          make_linmap)
from vis_utils import (vis_rgb_cube,
                       vis_hsv_cube,
                       vis_hists,
                       vis_pair,
                       vis_image,
                       lab_uniform)

Computing Entropy¶

In [ ]:

I = plt.imread('../dip_pics/happy128.png')
Ih = load_huffable_image(I)

In [ ]:

Ih = load_huffable_image('../dip_pics/happy128.png')

In [ ]:

arr_info(Ih)

In [ ]:

vis_image(Ih, cmap='gray',interpolation='none')

In [ ]:

vis_hists(I)

In [ ]:

# How many bits would we need?
8*np.prod(Ih.shape)

In [ ]:

bins = np.arange(257)
freq, bins = np.histogram(Ih.ravel(), bins=bins)

In [ ]:

# What's the average actual information per pixel in the image?
entropy(freq, base=2)

Huffman Trees¶

A Huffman tree is a binary tree structure whose leaf nodes represent the symbols in a signal (the pixel values in an image). The path from the root of the tree to a leaf node represents the bit encoding of the associated symbol (0 for left child, 1 for right child, for example). Let's see an example

In [ ]:

test_tree_making()

In the example above we start with pixel values in ['A', 'B', 'C', 'D']. The Huffman Tree construction algorithm basically sorts these symbols (image intensities for example) from least to most frequent as a collection of leaf nodes (a minheap is useful for this).

We then take the two least frequent nodes and combine them into one Huffman tree with the two symbols as leaf nodes and the frequency associated with this tree as the sum of the two individual frequencies.
Then insert this tree back into the collection. This is where a heap comes in useful, as a self-maintaining structure always providing access to the least frequent node.
Repeat. Each iteration we end up with one tree fewer in the collection/heap, until there is only one tree left.

The structure of that final tree represents an optimal encoding given the relative frequencies provided.

Variable-length encoding and decoding¶

See the code in huff_utils. One thing you'll notice about the encoder is that a lot of pixel values require many more than even 8 bits!!! How could that possibly save space?

In [ ]:

encoder, decoder = build_huff_pair(Ih)

In [ ]:

encoder

In [ ]:

decoder

In [ ]:

# Let's encode the image.
enI = ''.join([encoder[pix] for pix in Ih.ravel()])

In [ ]:

len(enI)

In [ ]:

8*np.prod(Ih.shape)

In [ ]:

# Compression ratio: old bits to new bits needed.
8*np.prod(Ih.shape)/len(enI)

In [ ]:

# Bits per pixel we're actually using; compare to entropy determined above.
len(enI)/np.prod(Ih.shape)

Decoding¶

So in the above we see how we can take an image--an array of pixel values--and encode it using the Huffman variable-length encoding. In short:

I = load_huffable_image('../dip_pics/happy128.png')
encoder, decoder = build_huff_pair(I)
enI = ''.join(encoder[pix] for pix in I.ravel())

After the above section executes, enI is the encoded image (a sequence of bits), which can be stored or transmitted more efficiently. However, when display of the image is needed again, we're going to need to decode the bit sequence back into an array of pixel values. We can use the decoder dictionary to do this.

Write a decode_image function that accepts such an encoded string (like enI) and the decoder dictionary, and outputs a list of intensites equal to uncompressed image I.ravel(). Show a validation test of this function. See array_equal: is the reconstructed array of intensities equal to the original (I.ravel() for example).

In [ ]:

Reconstructing¶

Now take the output of decode_image to reconstruct the original image I. Even though our encoded stream does not include the image shape, you may cheat there and use the original image shape “out of band” so to speak. In practice we could send along the image shape with the encoded stream without too much trouble. Show your reconstruction code and a figure supporting its success. See numpy.reshape.

In [ ]:

In Color¶

Do this with a color image (that is not happyFace). Perform Huffman compression separately on each of the three channels and show that you can encode, and then decode and reconstruct such a color image.

In [ ]: