import cv2
%matplotlib notebook
from matplotlib import pyplot as plt
/usr/local/lib/python2.7/dist-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
imread
under Python returns a NumPy arrayshape
atribute keeps the array's dimensionsIn the OpenCV’s Python wrapper, the imread
function returns an image as a NumPy array. The array dimensions can be read from the shape
attribute:
lenna = cv2.imread('data/lenna.tiff', cv2.IMREAD_GRAYSCALE)
lenna.shape
(512, 512)
uint8
arraysGrayscale images are commonly represented by 2D arrays of 8 bits unsigned integers, corresponding to values from 0 ("black") to 255 ("white"). In NumPy, this data type (dtype
) is named uint8
.
lenna.dtype
dtype('uint8')
lenna
array([[162, 162, 162, ..., 170, 155, 128], [162, 162, 162, ..., 170, 155, 128], [162, 162, 162, ..., 170, 155, 128], ..., [ 43, 43, 50, ..., 104, 100, 98], [ 44, 44, 55, ..., 104, 105, 108], [ 44, 44, 55, ..., 104, 105, 108]], dtype=uint8)
lenna[0,0]
162
plt.imshow(lenna, cmap=plt.cm.gray)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f1fd469b510>
uint8
arrayColor images in RGB are commonly represented by 24 bits, 8 bits for each one of the three channels (red, green and blue). In NumPy, a $M \times N$ color image can be represented by a $M \times N \times 3$ uint8
array.
mandrill = cv2.imread('data/mandrill.tiff')
mandrill.shape
(512, 512, 3)
mandrill.dtype
dtype('uint8')
mandrill[2,3]
array([29, 46, 54], dtype=uint8)
imread
returns color images in BGR orderThe output of the last command above shows the value at pixel $(2, 3)$ is 29, 46, 54. The image was loaded by OpenCV using the imread
procedure. OpenCV loads color images in BGR
order, so 29, 46 and 54 correspond to the values of blue, green and red respectively. The
triplet is returned as a 3-d vector (a unidimensional array). Direct access to the color value can be done indexing the color dimension - mandrill[2,3,1]
returns the green channel
value, 46.
mandrill[2,3,1]
46
plt.imshow(mandrill)
<matplotlib.image.AxesImage at 0x7f1fd45c4590>
mandrill_rgb = cv2.cvtColor(mandrill, cv2.COLOR_BGR2RGB)
plt.imshow(mandrill_rgb)
<matplotlib.image.AxesImage at 0x7f1fd4478190>
Images are not limited to non-negative integer types. Image convolutions can produce negative integers or real values. For example, the Sobel convolution kernel produces negative values representing the derivatives.
sobel = cv2.Sobel(lenna, cv2.CV_16SC1, 1, 1)
plt.imshow(sobel, cmap=plt.cm.viridis)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f1fd429b590>
sobel[100,100]
-8
import numpy as np
thermo = np.loadtxt('data/thermo-maize.csv', delimiter=';')
plt.title(r'Thermography image (float32)')
plt.imshow(thermo, cmap=plt.cm.hot)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f1fd4162290>
As seen previously in the Mandrill example, array’s elements can be indexed using the []
operator. Standard Python slicing can also be employed to retrieve parts of an array. Slicing employs the convention start:stop:step. For example, to retrieve the rows 3 to 9 of an bi-dimensional array $A$, the code A[3:10,:]
is used (note stop is non-inclusive). In a similar way, to also limit the columns to the range 5 to 8, A[3:10,5:9] is employed. Consider the user is only interested in rows 3, 5, 7 and 9. She could use a step equal to 2, producing A[3:10:2,5:9]
.
A = np.arange(100).reshape(10,10)
A
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
A[3:10:2,5:9]
array([[35, 36, 37, 38], [55, 56, 57, 58], [75, 76, 77, 78], [95, 96, 97, 98]])
A NumPy array can also be indexed by masks, defined as boolean or integer arrays. This approach is frequently called fancy indexing. In this example, a boolean mask is produced applying a logical operation on an array.
lenna > 128
array([[ True, True, True, ..., True, True, False], [ True, True, True, ..., True, True, False], [ True, True, True, ..., True, True, False], ..., [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False]], dtype=bool)
The zeros_like
function is employed to produce an array presenting the same dimensions of the input Lenna image, but all pixels values set to zero. Combined to the attribution operation in the last line, the code produces the matrix res
defined by:
res = np.zeros_like(lenna)
res[lenna > 128] = 255
plt.subplot(1,2,1)
plt.imshow(lenna, cmap=plt.cm.gray)
plt.subplot(1,2,2)
plt.imshow(res, cmap=plt.cm.binary_r)
<matplotlib.image.AxesImage at 0x7f1fcc6ee790>
Factors as ambient light intensity or the camera gain produce variations in the image contrast. These factors can be compensated by whitening, a per-pixel operation that normalizes the intensity, producing a zero mean image that presents unit variance. This example show as whitening can be efficiently performed by vectorized operations, but keeping the same simplicity of its mathematical definition.
I = cv2.imread('data/girl.jpg', cv2.IMREAD_GRAYSCALE)
plt.imshow(I, cmap=plt.cm.gray, vmin=0, vmax=255)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f1fcc3fc9d0>
Let $\mu_I$ and $\sigma_I$ be the mean and the standard deviation of a grayscale image $I$. The whitening operation is defined by:
\begin{equation} W_I[i,j] = \frac{I[i,j] - \mu_I}{\sigma_I}. \end{equation}mu_I = np.mean(I)
sigma_I = np.std(I)
W_I = (I - mu_I)/sigma_I
NumPy is able to perform scalar-array operations. In the code example above, all elements are subtracted by a scalar, $\mu_I$ , and the resulting array is divided by another scalar, $\sigma_I$. Below, pixels' values are converted to the [0, 255] range for 8 bits representation.
delta_W = W_I.max() - W_I.min()
W_uint8 = np.array(255./delta_W * (W_I - W_I.min()), dtype=np.uint8)
plt.imshow(W_uint8, cmap=plt.cm.gray, vmin=0, vmax=255)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f1fcc1f6650>
Sometimes, procedures must to be limited to a region of interest (ROI), a rectangular part of the image. In NumPy, the ROI is equivalent to the idea of view, an array sharing memory with another one. In the example below, $B$ is a view on array $A$. As expected, changes in B values produce the same change in $A$.
A = np.arange(25).reshape(5,-1)
A
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
B = A[0:3,0:3]
B
array([[ 0, 1, 2], [ 5, 6, 7], [10, 11, 12]])
B[0,0] = 255
B
array([[255, 1, 2], [ 5, 6, 7], [ 10, 11, 12]])
A
array([[255, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [ 10, 11, 12, 13, 14], [ 15, 16, 17, 18, 19], [ 20, 21, 22, 23, 24]])
Otherwise, if the $A$ must be preserved of any change in $B$, a copy of A is necessary. In the example below, the copy
method allocates more memory, data is copied and $A$ and $B$ do not share any memory.
B = A[0:3,0:3].copy()
B[0,0] = 128
B
array([[128, 1, 2], [ 5, 6, 7], [ 10, 11, 12]])
A
array([[255, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [ 10, 11, 12, 13, 14], [ 15, 16, 17, 18, 19], [ 20, 21, 22, 23, 24]])
Consider the UCI ML hand-written digits datasets
from sklearn import datasets
digits = datasets.load_digits()
digits['data'].shape
(1797, 64)
print digits['DESCR']
Optical Recognition of Handwritten Digits Data Set =================================================== Notes ----- Data Set Characteristics: :Number of Instances: 5620 :Number of Attributes: 64 :Attribute Information: 8x8 image of integer pixels in the range 0..16. :Missing Attribute Values: None :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr) :Date: July; 1998 This is a copy of the test set of the UCI ML hand-written digits datasets http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits The data set contains images of hand-written digits: 10 classes where each class refers to a digit. Preprocessing programs made available by NIST were used to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions. For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G. T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C. L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469, 1994. References ---------- - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University. - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika. - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005. - Claudio Gentile. A New Approximate Maximal Margin Classification Algorithm. NIPS. 2000.
x = digits['data'][0]
x
array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10., 15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4., 12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8., 0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5., 10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])
plt.imshow(digits['data'][0:30,:], interpolation='nearest', cmap=plt.cm.gray)
plt.xlabel('Dimensions')
plt.ylabel('Digit sample')
<matplotlib.text.Text at 0x7f1fa4116dd0>
Reshaping is other operation that produces a view on an array. The reshaped array is a view presenting the same number of elements, but different dimensions. As an exam- ple, consider the Handwritten Digits Data Set in the UCI Machine Learning Repository. In the handwritten digits classification problem, the $N \times N$ images are usually transformed in $N$ 2-d feature vectors for supervised machine learning. A 64-d feature vector $\mathbf{x}$ can be viewed as a $8 \times 8$ image by:
X = x.reshape(8,8)
X
array([[ 0., 0., 5., 13., 9., 1., 0., 0.], [ 0., 0., 13., 15., 10., 15., 5., 0.], [ 0., 3., 15., 2., 0., 11., 8., 0.], [ 0., 4., 12., 0., 0., 8., 8., 0.], [ 0., 5., 8., 0., 0., 9., 8., 0.], [ 0., 4., 11., 0., 1., 12., 7., 0.], [ 0., 2., 14., 5., 10., 12., 0., 0.], [ 0., 0., 6., 13., 10., 0., 0., 0.]])
plt.imshow(X, interpolation='nearest', cmap=plt.cm.gray)
<matplotlib.image.AxesImage at 0x7f1f9f6d39d0>
Similarly, a $8 \times 8$ image can be viewed as a 64-d vector using:
x = X.reshape(-1)
x
array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13., 15., 10., 15., 5., 0., 0., 3., 15., 2., 0., 11., 8., 0., 0., 4., 12., 0., 0., 8., 8., 0., 0., 5., 8., 0., 0., 9., 8., 0., 0., 4., 11., 0., 1., 12., 7., 0., 0., 2., 14., 5., 10., 12., 0., 0., 0., 0., 6., 13., 10., 0., 0., 0.])