Kevin J. Walchko, Phd
20 Nov 2018
Stereo Vision is utilizing 2 (or more) pinhole cameras. We typically use a similar model to explaine geometry, but since we have more than one camera, we can get depth (z-direction) information.
Second Edition](http://www.robots.ox.ac.uk/~vgg/hzbook/hzbook2/HZepipolar.pdf)
We will look a calibrating a stereo camera using the standard chessboard pattern.
We need to estimate some camera parameters, both internal and external. Basic process:
cv2.findMarkers()
to find your calibration target's pattern (objpts
) in the image (imgpts
) for each gray scale imageobjpts
are how the features exist in the 3D worldimgpts
are how the same features appear in the 2D imagecv2.cornerSubPix()
to refine your resultscv2.calibrateCamera()
to get the left/right camera intrinsic parameterscv2.stereoCalibrate()
to get the system extrinsic parametersNow we have a set of parameters:
Stereo matching generally utilizes searching across epipolar lines for features that match. This helps to greatly reduce the search space and speed things up. The essential and fundamental matricies encode the epipolar geometery between two views of the same objects. An epipolar line:
$$ au+bv+c=0 \\ p = \begin{bmatrix} u & v & 1 \end{bmatrix}^T \\ l = \begin{bmatrix} a & b & c \end{bmatrix}^T \\ p^Tl = l^Tp = 0 $$It turns out we can use these equations for epipolar lines and subtitute in $E$ or $F$.
$$ p^T_1 l_1 = p^T_1 E p_2 = 0 $$where $p_1$ a point in the left camera and $p_2$ is a point in the right camera.
Like F, E carries knowledge of the stereo setup, but it exists in normalized space, not pixel space.
$$ \begin{eqnarray} \hat x' = R(\hat x -T) \\ E = R[t]_x \\ [t]_x = \begin{bmatrix} 0 & -T_z & T_y \\ T_z & 0 & -T_x \\ -T_y & T_x & 0 \end{bmatrix} \end{eqnarray} $$Given image points in both cameras (with noise), the E can be estimated using a total least squares approach. This approach is commonly referred to as the eight-point-algorithm
See CSE486 [3] for examples of finding $F$ and calculating epipolar lines.
During camera manufacturing, small mechanical misalignments are produced which can effect stereo performance. To fix the image and make straight lines (think epipolar lines) straight again:
cv2.stereoRectify()
to produce necessary matriciescv2.initUndistortRectifyMap()
for the left/right cameracv2.remap()
and the products from the above functions. The good thing, though, is the first 2 function can be called once so you have everything. Then, during run-time, you can just use cv2.remap()
Objects closer to the stereo cameras will have larger disparity, meaning they will be found in different places in the two images. Conversly, objects farther way will have smaller disparity and will be found roughly in the same row/column of both images.
Stereo models like cv2.StereoSGBM
or cv2.StereoBM
can be used to find corresponding points in the image pair (left and right) and calculate the disparity between them (number of pixels). Instead of matching individual pixels, the stereo models tend to match blocks of pixels.
NaN
means the feature couldn't be matchedOpenCV 3.4 docs: ref
The class computing stereo correspondence using the block matching algorithm, introduced and contributed to OpenCV by K. Konolige. Stereo algorithms tend not to match points (which is what the math above leads you to believe) but rather they match square blocks of pixels (e.g., 5x5, 7x7).
OpenCV 3.4 docs: ref
The class implements the modified H. Hirschmuller algorithm that differs from the original one as follows:
mode=StereoSGBM::MODE_HH
in StereoSGBM_create
to run the full variant of the algorithm but beware that it may consume a lot of memory.StereoBM::PREFILTER_XSOBEL
type) and post-filtering (uniqueness check, quadratic interpolation and speckle filtering).StereoSGBM::MODE_HH
to run the full-scale two-pass dynamic programming algorithm. It will consume O(W*H*numDisparities) bytes, which is large for 640x480 stereo and huge for HD-size pictures. By default, it is set to False
.%load_ext autoreload
%autoreload 2
import numpy as np
np.set_printoptions(precision=1)
np.set_printoptions(suppress=True)
import cv2
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10) # set default figure width
from slurm import storage
from pprint import pprint
from pathlib import Path
import opencv_camera as oc
from opencv_camera import CameraCalibration
from opencv_camera import StereoCalibration
from opencv_camera import ChessboardFinder
from opencv_camera import UnDistort
from opencv_camera import bgr2gray, gray2bgr
from opencv_camera import drawEpipolarLines
from opencv_camera import coverage, mosaic
from opencv_camera import stereoOverlay
from opencv_camera import StereoCamera
def readImages(pattern):
p = Path("cal_pics").glob(pattern)
p = list(p)
p.sort()
return [cv2.imread(str(f),0) for f in p]
imgL = readImages("left*.jpg")
imgR = readImages("right*.jpg")
# a simple mosasic of the various images
plt.imshow(mosaic(imgL, width=4), cmap="gray")
plt.title(f"{len(imgL)} Images @ {imgL[0].shape}")
plt.axis("off");
stereoCal = StereoCalibration()
board = ChessboardFinder((9,6), 1)
# N: detected corners
# cm.imgpointsL/R: list(np.array.shape(N,2))
# cm.objpoints: list(np.array.shape(N,3))
ok, cm = stereoCal.calibrate(imgL, imgR, board)
Object Pts: 13 Left Image Pts: 13 Right Image Pts: 13 (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) (54, 3) objpoints <class 'list'> 13 (54, 3) imgpoints_l <class 'list'> 13 (54, 2) imgpoints_r <class 'list'> 13 (54, 2)
# ip[0]
cm["objpoints"][0].shape
(54, 3)
# let's plot where every corner was detected. Notice
# we are not testing the edges of the camera! That
# is where a lot of distortion occurs from low quality
# lenses. Ideally, you want better coverage.
plt.subplot(1,2,1)
ip = cm["imgpointsL"]
icv = coverage((480, 640), ip)
plt.imshow(icv)
plt.title("Left Camera Target Points");
plt.subplot(1,2,2)
ip = cm["imgpointsR"]
icv = coverage((480, 640), ip)
plt.imshow(icv)
plt.title("Right Camera Target Points");
sc = StereoCamera(
cm["K1"],cm["d1"],
cm["K2"],cm["d2"],
cm["R"],cm["T"],
cm["F"],
cm["E"]
)
print(sc)
sc.to_yaml("camera.yml")
Camera 1 -------------------------- focalLength(x,y): 533.4 533.4 px principlePoint(x,y): 342.5 234.7 px distortionCoeffs: [[-0.282 0.039 0.001 -0. 0.119]] Camera 2 -------------------------- focalLength(x,y): 537.0 536.6 px principlePoint(x,y): 327.4 249.9 px distortionCoeffs: [[-0.296 0.139 -0.001 0. -0.049]] Extrinsic Camera Parameters ------- Translation between Left/Right Camera: [[-3.327] [ 0.037] [-0.005]] Rotation between Left/Right Camera: [[ 1. 0.004 0.004] [-0.004 1. -0.007] [-0.004 0.007 1. ]] Essential Matrix: [[-0. 0.005 0.037] [-0.019 0.024 3.327] [-0.025 -3.327 0.024]] Fundatmental Matrix: [[ 0. -0. -0.001] [ 0. -0. -0.092] [ 0. 0.093 1. ]]
# See the answer from OpenCV Tutorials
# this is the answer they came up with
ans = cv2.FileStorage("cal_pics/intrinsics.yml", cv2.FILE_STORAGE_READ)
print("Left camera K1 and D1")
print("-"*40)
print(ans.getNode("M1").mat())
print(ans.getNode("D1").mat())
print("")
print("Right camera K2 and D2")
print("-"*40)
print(ans.getNode("M2").mat())
print(ans.getNode("D2").mat())
Left camera K1 and D1 ---------------------------------------- [[534.803 0. 335.686] [ 0. 534.803 240.662] [ 0. 0. 1. ]] [[ 0.296 -1.035 0. 0. 0. ]] Right camera K2 and D2 ---------------------------------------- [[534.803 0. 334.557] [ 0. 534.803 242.053] [ 0. 0. 1. ]] [[-0.169 -0.112 0. 0. 0. ]]
# h,w = imgL[0].shape
# ud = sc.getUndistortion(h, w)
# num = 3
# cl, cr = ud.undistort(imgL[num], imgR[num])
# corr = drawEpipolarLines(cm["imgpointsL"],cm["imgpointsR"],cl,cr)
# plt.imshow(corr, cmap="gray")
# plt.title(f"{corr.shape}");
# from opencv_camera.stereo.utils import computeF
# # calculated from opencv
# print("OpenCV:")
# print(sc.F)
# # come up with the same normalized F matrix as OpenCV
# print("\ncomputeF:")
# ee = computeF(sc.K1, sc.K2, sc.R, sc.T)
# print(ee)
cv2.stereoRectify()
¶r1,r2,p1,p2,q,roi1,roi2 = cv2.stereoRectify(
sc.K1,sc.d1,
sc.K2,sc.d2,
(480,640),
sc.R,sc.T,
flags=cv2.CALIB_ZERO_DISPARITY)
print(r1,"\n---------")
print(r2,"\n---------")
print(p1,"\n---------")
print(p2,"\n---------")
print(q,"\n---------")
print(f"left: {roi1} right: {roi2}")
[[ 1. -0.008 0.006] [ 0.008 1. -0.004] [-0.006 0.004 1. ]] --------- [[ 1. -0.011 0.001] [ 0.011 1. 0.004] [-0.001 -0.004 1. ]] --------- [[535.022 0. 343.639 0. ] [ 0. 535.022 235.731 0. ] [ 0. 0. 1. 0. ]] --------- [[ 535.022 0. 343.639 -1780.16 ] [ 0. 535.022 235.731 0. ] [ 0. 0. 1. 0. ]] --------- [[ 1. 0. 0. -343.639] [ 0. 1. 0. -235.731] [ 0. 0. 0. 535.022] [ 0. 0. 0.301 -0. ]] --------- left: (0, 0, 480, 640) right: (14, 0, 466, 623)
num = 0
ll,rr = ud.undistort(imgL[num],imgR[num])
cc = np.hstack((ll,rr))
plt.imshow(cc, cmap="gray");
# for fun, let's overlay the images and adjust to get an idea
# of how well the retification worked ... it won't be perfect
# depending on setup of the cameras and collection. Also
# remember, we already pointed out the entire image plane was
# not covered with target features (corners).
tmp = stereoOverlay(ll, rr, 130,-5)
plt.imshow(tmp, cmap="gray")
plt.title(f"{tmp.shape}");