stough 202-
Images and videos can be quite large. At 60 frames per second, a 4K Ultra HD movie of 2 hours could require
\begin{equation*} ((3840\cdot2160)pixels \cdot 3\frac{byte}{pixel} = 24883200 \frac{byte}{frame}\\ \frac{24883200 \frac{byte}{frame}\cdot60\frac{frame}{second}\cdot60\frac{second}{minute}\cdot120 minutes}{10^9\frac{byte}{GB}} = 10749.5 GB, \end{equation*}or about 163 blu-ray discs to store without compression. So clearly compression is a thing.
Compression takes advantage of different kinds of redundancy in the signal. Images in particular have a lot of redundancy. Spatial and Temporal redundancy are characterized by the fact that most pixels are closely related their neighbors. This is due to the fact that any images of human interest are likely to have objects comprised of regions of similar color. Temporal redundancy is the video case, where a pixel over time changes little usually.
Coding redundancy is the one we're dealing with here. Most images we consider are 3-channel images with 8 bits per pixel per channel. We need 8 bits to represent a value in $[0, 255]$. Variable-length encoding can take advantage of the differential frequency of certain pixel values over others, encoding more common pixels with fewer bits, and paying for that by using more bits to represent less common pixels.
In this notebook we'll explore Huffman variable-length encoding and its effectiveness in some images.
Notice I've added our own custom Huffman utilities huff_utils
and scipy.stats.entropy
. Study build_huff_tree
in particular.
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as mcolors
import skimage.color as color
from ipywidgets import VBox, HBox, FloatLogSlider
from scipy.stats import entropy
# For importing from alternative directory sources
import sys
sys.path.insert(0, '../dip_utils')
from huff_utils import (build_huff_tree,
build_huff_encoder,
build_huff_pair,
load_huffable_image,
test_tree_making)
from matrix_utils import (arr_info,
make_linmap)
from vis_utils import (vis_rgb_cube,
vis_hsv_cube,
vis_hists,
vis_pair,
vis_image,
lab_uniform)
I = plt.imread('../dip_pics/happy128.png')
Ih = load_huffable_image(I)
Ih = load_huffable_image('../dip_pics/happy128.png')
arr_info(Ih)
vis_image(Ih, cmap='gray',interpolation='none')
vis_hists(I)
# How many bits would we need?
8*np.prod(Ih.shape)
bins = np.arange(257)
freq, bins = np.histogram(Ih.ravel(), bins=bins)
# What's the average actual information per pixel in the image?
entropy(freq, base=2)
A Huffman tree is a binary tree structure whose leaf nodes represent the symbols in a signal (the pixel values in an image). The path from the root of the tree to a leaf node represents the bit encoding of the associated symbol (0 for left child, 1 for right child, for example). Let's see an example
test_tree_making()
In the example above we start with pixel values in ['A', 'B', 'C', 'D']. The Huffman Tree construction algorithm basically sorts these symbols (image intensities for example) from least to most frequent as a collection of leaf nodes (a minheap is useful for this).
The structure of that final tree represents an optimal encoding given the relative frequencies provided.
See the code in huff_utils
. One thing you'll notice about the encoder is that a lot of pixel values require many more than even 8 bits!!! How could that possibly save space?
encoder, decoder = build_huff_pair(Ih)
encoder
decoder
# Let's encode the image.
enI = ''.join([encoder[pix] for pix in Ih.ravel()])
len(enI)
8*np.prod(Ih.shape)
# Compression ratio: old bits to new bits needed.
8*np.prod(Ih.shape)/len(enI)
# Bits per pixel we're actually using; compare to entropy determined above.
len(enI)/np.prod(Ih.shape)
So in the above we see how we can take an image--an array of pixel values--and encode it using the Huffman variable-length encoding. In short:
I = load_huffable_image('../dip_pics/happy128.png')
encoder, decoder = build_huff_pair(I)
enI = ''.join(encoder[pix] for pix in I.ravel())
After the above section executes, enI
is the encoded image (a sequence of bits), which can be stored or transmitted more efficiently. However, when display of the image is needed again, we're going to need to decode the bit sequence back into an array of pixel values. We can use the decoder
dictionary to do this.
Write a decode_image
function that accepts such an encoded string (like enI
) and the decoder
dictionary, and outputs a list of intensites equal to uncompressed image I.ravel()
. Show a validation test of this function. See array_equal
: is the reconstructed array of intensities equal to the original (I.ravel()
for example).
Now take the output of decode_image
to reconstruct the original image I
. Even though our encoded stream does not include the image shape, you may cheat there and use the original image shape “out of band” so to speak. In practice we could send along the image shape with the encoded stream without too much trouble. Show your reconstruction code and a figure supporting its success. See numpy.reshape
.
Do this with a color image (that is not happyFace). Perform Huffman compression separately on each of the three channels and show that you can encode, and then decode and reconstruct such a color image.