It is well known that least squares parameter estimation is not resistent, i.e. sensitive to outliers. Even one outlier may break least squares estimation, which means a possibly infinitely large parameter bias.
This problem is effectively solved by RANSAC (RANdom Sample And Consensus) introduced By Fischler and Bolles in 1981. They solved a problem with a large percent of outliers in the input data. Even 50% of outliers may be tolerated by RANSAC.
Therefore RANSAC is a resistent iterative parameter estimation procedure that uses a subset of data consistent with a model. At each iteration step the procedure works as follows:
Iteration ends upon reaching a specified maximum number or upon a specific criterion. Model parameters are then estimated using the best model.
RANSAC is illustrated with a simple problem.
Generate conformal data that approximately fit on an ellipse:
import numpy as np
t = np.linspace(0, 2 * np.pi, 50)
a = 10
b = 5
xc = 20
yc = 30
theta = np.pi/6
x = xc + a*np.cos(theta)*np.cos(t) - b*np.sin(theta)*np.sin(t)
y = yc + a*np.sin(theta)*np.cos(t) + b*np.cos(theta)*np.sin(t)
data = np.column_stack([x, y])
# for reproducibility:
np.random.seed(seed=1234)
data += np.random.normal(size=data.shape)
Add some outliers:
data[0] = (100, 100)
data[1] = (110, 120)
data[2] = (120, 130)
data[3] = (140, 130)
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(data[:,0],data[:,1],'r.')
plt.show()
The image processing library Scikit-image will be used for estimation. It contains implementation of both least squares and RANSAC estimation that can be used for our problem. We import only a few functions in sm.py
import sm
model = sm.EllipseModel()
model.estimate(data)
# ellipse parameters:
# xc, yc, a, b, theta
np.set_printoptions(suppress=True)
model.params
[73.53514375565935, 78.26758207018908, 9.294963688237154, 80.77273222385621, 2.3039489382317697]
As we see these parameters are completely wrong. Original semi-axes of the ellipse were 10 and 5, the centre was at $O(20,30)$.
# at least 5 points are needed for ellipse fitting
n_min = 5
# maximum distance of conform points
t_max = 3.0
ransac_model, inliers = sm.ransac(data, sm.EllipseModel, n_min, t_max, max_trials=50)
print(ransac_model.params)
original_params = np.array([xc,yc,a,b,theta])
print(original_params - ransac_model.params)
[20.060780840092757, 29.44604087550921, 5.615422139522865, 9.2913820935979, 2.092276704111832] [-0.06078084 0.55395912 4.38457786 -4.29138209 -1.56867793]
Conformal data were found and stored in array 'inliers' as logical 'True' elements:
inliers
array([False, False, False, False, True, True, True, True, True, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])
We see that the first four data are outliers that were added by us.
t = np.linspace(0, 2 * np.pi, 100)
p = ransac_model.params
xe = p[0] + p[2]*np.cos(p[4])*np.cos(t) - p[3]*np.sin(p[4])*np.sin(t)
ye = p[1] + p[2]*np.sin(p[4])*np.cos(t) + p[3]*np.cos(p[4])*np.sin(t)
plt.clf()
plt.plot(data[inliers,0],data[inliers,1],'r.')
plt.plot(xe,ye,'b-')
plt.plot(p[0],p[1],'bo')
plt.axis('equal')
plt.show()