Epipolar Geometry
Table of Contents:
- Why know Epipolar Geometry?
- The Fundamental Matrix
- The Direct Linear Transform (DLT)
- Normalized 8-Point Algorithm
- Implementation: Visualizing Epipolar Lines
- Epipolar Point Transfer in Image Triplets
- Proof: the SVD provides the solution
Why know Epipolar Geometry?
Modern approaches to solving computer vision problems such as Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM) heavily rely upon “feature matching,” or the ability to find accurate keypoint correspondences between pairs of images. Tools from Epipolar geometry are a simple, and often effective, way to discard outliers in feature matching and are widely used.
The Fundamental Matrix
The Fundamental matrix provides a correspondence \(x^TFx^{\prime} = 0\), where \(x,x^{\prime}\) are 2D corresponding points in separate images. In other words,
\[\begin{bmatrix} u^{\prime} & v^{\prime} & 1 \end{bmatrix} \begin{bmatrix} f_{11} & f_{12} & f_{13} \\ f_{21} & f_{22} & f_{23} \\ f_{31} & f_{32} & f_{33} \end{bmatrix} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = 0\]The Direct Linear Transform (DLT)
Longuet-Higgins’ 8-Point Algorithm [1] provides the solution for estimating \(F\) if at least 8 point correspondences are provided. A system of linear equations is formed as follows:
\[Af = \begin{bmatrix} u_1 u_1^{\prime} & u_1v_1^{\prime} & u_1 & v_1 u_1^{\prime} & v_1 v_1^{\prime} & v_1 & u_1^{\prime} & v_1^{\prime} & 1 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ u_n u_n^{\prime} & u_n v_n^{\prime} & u_n & v_n u_n^{\prime} & v_n v_n^{\prime} & v_n & u_n^{\prime} & v_n^{\prime} & 1 \end{bmatrix} \begin{bmatrix} f_{11} \\ f_{12} \\ f_{13} \\ f_{21} \\ \vdots \\ f_{33} \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}\]The matrix-vector product above can be driven to zero by minimizing the norm, and avoiding the degenerate solution that \(x=0\) with a constraint that the solution lies upon the unit ball, e.g.
\[\begin{array}{ll} \underset{\|x\|=1}{\mbox{minimize}} & \|A x \|_2^2 = x^TA^TAx = x^TBx \end{array}\]By the Courant-Fisher characterization, it is well known that if \(B\) is a \(n \times n\) symmetric matrix with eigenvalues \(\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n \) and corresponding eigenvectors \(v_1, \dots, v_n\), then
\[v_n = \mbox{arg } \underset{\|x\|=1}{\mbox{min }} x^TBx\]meaning the eigenvector associated with the smallest eigenvalue \(\lambda_n\) of \(B=A^TA\) is the solution \(x^{\star}\), which is equivalent to the smallest right singular vector $v$ from the SVD of \(A=U \Sigma V^T\). The vector \(x^{\star}\) contains the 9 entries of the Fundamental matrix \(F^{\star}\).
Normalized 8-Point Algorithm
Hartley [4] proposes a simple modification to the classical 8-point algorithm to make it robust to noise. By preceding the algorithm with a very simple normalization (translation and scaling) of the coordinates of the matched points, results are obtained comparable with the best iterative algorithms.
The main idea is to replace coordinates \(\mathbf{u}_a\) in image \(a\) with \(\mathbf{\hat{u}}_a = T_a \mathbf{u}_a\), and coordinates \(\mathbf{u}_b\) in image \(b\) with \(\mathbf{\hat{u}}_b = T_b \mathbf{u}_b\).
If \(T\) is chosen to be invertible, then we can recover the original coordinates from the transformed ones, as
\[\begin{aligned} \mathbf{\hat{u}}_a &= T_a \mathbf{u}_a \\ T_a^{-1} \mathbf{\hat{u}}_a &= T_a^{-1} T_a \mathbf{u}_a \\ T_a^{-1} \mathbf{\hat{u}}_a &= \mathbf{u}_a \end{aligned}\]Substituting in the equation \(\mathbf{u}_b^T F \mathbf{u}_a = 0\), we derive the equation
\[\begin{aligned} \mathbf{u}_b^T F \mathbf{u}_a = 0 \\ (T_b^{-1} \mathbf{\hat{u}}_b)^T F T_a^{-1} \mathbf{\hat{u}}_a = 0 \\ \mathbf{\hat{u}}_b^T T_b^{-T} F T_a^{-1} \mathbf{\hat{u}}_a = 0 \end{aligned}\]If we use the normalized points \(\mathbf{u}_a, \mathbf{u}_b^T\) when fitting the Fundamental matrix, then we will end up estimating \(\hat{F} = T_b^{-T} F T_a^{-1} \). In other words, \(\mathbf{u}_b^T F \mathbf{u}_a = \mathbf{\hat{u}}_b^T \hat{F} \mathbf{\hat{u}}_a \). If we want to find out the original \(F\) that corresponded to raw (unnormalized) point coordinates, than we need to transform backwards:
\[\begin{aligned} \hat{F} &= T_b^{-T} F T_a^{-1} \\ T_b^{T} \hat{F} &= T_b^{T} T_b^{-T} F T_a^{-1} \\ T_b^{T} \hat{F} T_a &= F T_a^{-1} T_a \\ T_b^{T} \hat{F} T_a &= F \end{aligned}\]Implementation: Visualizing Epipolar Lines
If p_right^T @ F @ p_left = 0
for correct correspondences, then two key insights are that:
- Each epipolar line \(l_e\) in the right image corresponds to a point
p_left
in the left image.le_right = F @ p_left
- Each epipolar line \(l_e\) in the left image corresponds to a point
p_right
in the right image.le_left = F.T @ p_right
We show below how to generate the epipolar line for every single corresponding point. First, we’ll convert the 2d Cartesian keypoints to homogeneous coordinates:
def convert_to_homogenous_coordinates(coords: np.ndarray) -> np.ndarray:
"""Convert coordinates to homogenous system (by appending a column of ones)."""
N = coords.shape[0]
return np.hstack((coords, np.ones((N, 1)),))
def draw_epipolar_lines(
F: np.ndarray,
img_left: np.ndarray,
img_right: np.ndarray,
pts_left: np.ndarray,
pts_right: np.ndarray,
figsize: Tuple[int,int] = (10, 8)
) -> None:
"""Draw the epipolar lines given the fundamental matrix, left & right images and left & right datapoints.
Args:
F: a 3 x 3 numpy array representing the fundamental matrix, such that
p_right^T @ F @ p_left = 0 for correct correspondences
img_left: array representing image 1.
img_right: array representing image 2.
pts_left: array of shape (N,2) representing image 1 datapoints.
pts_right: array of shape (N,2) representing image 2 datapoints.
"""
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=figsize)
# defn of epipolar line in the right image, corresponding to left point p
# l_e = F @ p_left
draw_lines_in_single_image(ax=ax[1], i2Fi1=F, pts_i1=pts_left, pts_i2=pts_right, img_i2=img_right)
# defn of epipolar line in the left image, corresponding to point p in the right image
# l_e = F.T @ p_right
draw_lines_in_single_image(ax=ax[0], i2Fi1=F.T, pts_i1=pts_right, pts_i2=pts_left, img_i2=img_left)
We’ll show how to compute and plot lines for a single image first. First, we’ll compute the left and right image border lines by taking cross products.
def draw_lines_in_single_image(ax, i2Fi1: np.ndarray, pts_i1: np.ndarray, pts_i2: np.ndarray, img_i2: np.ndarray) -> None:
"""Draw epipolar lines in image 2, where each epipolar line corresponds to a point from image 1.
Args:
ax: matplotlib drawing axis.
i2Fi1: Fundamental matrix, mapping points in image 1, to lines in image2.
pts_i1: array of shape (N,2) representing image 1 datapoints.
pts_i2: array of shape (N,2) representing image 2 datapoints.
img_i2: array of shape (H,W,3) representing image 2.
"""
img_h, img_w = img_i2.shape[:2]
# corner points, as homogeneous (x,y,1)
p_ul = np.asarray([0, 0, 1])
p_ur = np.asarray([img_w, 0, 1])
p_bl = np.asarray([0, img_h, 1])
p_br = np.asarray([img_w, img_h, 1])
# The equation of the line through two points
# can be determined by taking the ‘cross product’
# of their homogeneous coordinates.
# left and right border lines (applicable for both images)
l_l = np.cross(p_ul, p_bl)
l_r = np.cross(p_ur, p_br)
# convert to homogeneous
pts_i1 = convert_to_homogenous_coordinates(pts_i1)
ax.imshow(img_i2)
ax.autoscale(False)
ax.scatter(pts_i2[:, 0], pts_i2[:, 1], marker='o', s=20, c='yellow', edgecolors='red')
for p_i1 in pts_i1:
# get defn of epipolar line in image, corresponding to left point p
l_e = np.dot(i2Fi1, p_i1).squeeze()
# find where epipolar line in right image crosses the left and image borders
p_l = np.cross(l_e, l_l)
p_r = np.cross(l_e, l_r)
# convert back from homogeneous to cartesian by dividing by 3rd entry
# draw line between point on left border, and on the right border
x = [p_l[0]/p_l[2], p_r[0]/p_r[2]]
y = [p_l[1]/p_l[2], p_r[1]/p_r[2]]
ax.plot(x, y, linewidth=1, c='blue')
Epipolar Point Transfer in Image Triplets
Given known epipolar geometry between a triplet of images, we can (in certain cases) know exactly where a corresponding keypoint should be found within a 3rd image. This allows us to “transfer points” to a 3rd image, using intersection of epipolar lines (via Fundamental matrices).
Imagine we have three view of a door, from Carl Olsson’s Lund Door dataset (Image 1, Image 2, Image 3):
Hartley Zisserman, p. 398 show how to do this. Suppose we know the three fundamental matrices \(F_{21}\), \(F_{31}\) and \(F_{32}\) relating the three views, and let points \(\mathbf{x}\) and \(\mathbf{x}^\prime\) in the first two views be a matched pair. We wish to find the corresponding point \(\mathbf{x}^{\prime\prime}\) in the third image.
The required point \(\mathbf{x}^{\prime\prime}\) matches point \(\mathbf{x}\) in the first image, and consequently must lie on the epipolar line corresponding to \(\mathbf{x}\). Since we know \(F_{31}\), this epipolar line may be computed, and is equal to \(F_{31}\mathbf{x}\). By a similar argument, \(\mathbf{x}^{\prime\prime}\) must lie on the epipolar line \(F_{32}\mathbf{x}^\prime\). Taking the intersection of the epipolar lines gives:
\[\mathbf{x}^{\prime\prime} = (F_{31} \mathbf{x}) \times (F_{32} \mathbf{x}^\prime)\]Similar details can be found in Sweeney [ICCV 2015], Section 3.1.
For example, in a 3rd image below, we see the epipolar lines that correspond to matching point in image 1, and to the matching point in image 2:
The points are found at the intersection of two epipolar lines (exactly where we expect them). Note that there are degenerate cases to be considered here (e.g. when these lines are collinear).
def draw_epipolar_lines_image_triplet(
img_i1: np.ndarray,
img_i2: np.ndarray,
img_i3: np.ndarray,
pts_i1: np.ndarray,
pts_i2: np.ndarray,
pts_i3: np.ndarray,
i3Fi1: np.ndarray,
i3Fi2: np.ndarray,
figsize: Tuple[int,int] = (10, 8)
) -> None:
"""Draw the lines in image 3 that correspond to points in image 1, and to points in image 2."""
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=figsize)
draw_lines_in_single_image(ax=ax, i2Fi1=i3Fi1, pts_i1=pts_i1, pts_i2=pts_i3, img_i2=img_i3)
draw_lines_in_single_image(ax=ax, i2Fi1=i3Fi2, pts_i1=pts_i2, pts_i2=pts_i3, img_i2=img_i3)
If we wish to measure the error between where we expected to see such a point, and where we actually see it, we can use:
def fmat_point_transfer(
i3Fi1: np.ndarray,
i3Fi2: np.ndarray,
matched_keypoints_i1: np.ndarray,
matched_keypoints_i2: np.ndarray,
matched_keypoints_i3: np.ndarray,
) -> np.ndarray:
"""Transfer points to a 3rd image, using intersection of epipolar lines (via Fundamental matrices), and measure error.
Args:
match_keypoints_i1: (N,2) array representing measurements in view 1 for each of N tracks (in 3 views).
match_keypoints_i2: (N,2) array representing measurements in view 2 for each of N tracks (in 3 views).
match_keypoints_i3: (N,2) array representing measurements in view 3 for each of N tracks (in 3 views).
Returns:
errors, as distances in the image plane.
"""
matched_h_keypoints_i1 = feature_utils.convert_to_homogenous_coordinates(matched_keypoints_i1)
matched_h_keypoints_i2 = feature_utils.convert_to_homogenous_coordinates(matched_keypoints_i2)
matched_h_keypoints_i3 = feature_utils.convert_to_homogenous_coordinates(matched_keypoints_i3)
J = len(matched_keypoints_i1)
dists = np.zeros(J)
for j in range(J):
p1 = matched_h_keypoints_i1[j]
p2 = matched_h_keypoints_i2[j]
p3 = matched_h_keypoints_i3[j]
l = i3Fi1 @ p1
l_ = i3Fi2 @ p2
p3_expected = np.cross(l, l_)
p3_expected = p3_expected[:2] / p3_expected[2]
error_2d = np.linalg.norm(p3[:2] - p3_expected)
dists[j] = error_2d
return dists
Computing Epipoles
Another definition based on co-planarity can be found here by Kris Katani.
Proof: the SVD provides the solution
The proof is almost always taken for granted, but we will provide it here for completeness. This is a specific instance of the extremal trace \([2]\) (or trace minimization on a unit sphere) problem, with \(k=1\), i.e.
\[\begin{aligned} \begin{array}{ll} \mbox{minimize} & \mathbf{\mbox{Tr}}(X^TBX) \\ \mbox{subject to} & X^TX=I_k \end{array} \end{aligned}\]where \(I_k\) denotes the \(k \times k\) identity matrix. The unit ball constraint avoids the trivial solution when all eigenvalues \(\lambda_i\) are zero instead of a single zero eigenvalue.
In our case, since \(U,V\) are orthogonal matrices (with orthonormal columns), then \(U^TU=I\). Thus, the SVD of \(B\) yields \begin{equation} A^TA = (U\Sigma V^T)^T (U\Sigma V^T) = V\Sigma U^TU \Sigma V^T = V \Sigma^2 V^T. \end{equation}
Since \(B=A^TA\), \(B\) is symmetric, and thus the columns of \(V=\begin{bmatrix}v_1 \dots v_n \end{bmatrix}\) are eigenvectors of \(B\). \(V\) can equivalently be computed with the SVD of \(A\) or \(B\), since \(V\) appears in both decompositions: \(A=U \Sigma V^T\) and \(B=V\Sigma^2V^T\).
Because \(B\) is symmetric, there exists a set of \(n\) orthonormal eigenvectors, yielding an eigendecomposition \(B=V^T \Lambda V\). Thus,
\[\begin{array}{llll} \mbox{arg } \underset{\|x\|=1}{\mbox{min }} & x^TBx = \mbox{arg } \underset{\|x\|=1}{\mbox{min }} & x^TV^T \Lambda Vx = \mbox{arg } \underset{\|x\|=1}{\mbox{min }} & (Vx)^T \Lambda (Vx) \end{array}\]Since \(V\) is orthogonal, \(|Vx| = |x|\), thus minimizing \((Vx)^T \Lambda (Vx)\) is equivalent to minimizing \(x^T \Lambda x\). Since \(\Lambda\) is diagonal, \(x^TBx = \sum\limits_{i=1}^{n} \lambda_i x_i^2\) where \({\lambda_i}_{i=1}^n\) are the eigenvalues of \(B\). Let \(q_i=x_i^2\), meaning \(q_i\geq 0\) since it represents a squared quantity. Since \(|x|=1\), then \(\sqrt{\sum\limits_i x_i^2}=1\), \(\sum\limits_i x_i^2=1 \), \(\sum\limits_i q_i = 1\). Thus,
\begin{equation} \underset{|x|=1}{\mbox{min }} x^TBx= \underset{|x|=1}{\mbox{min }} \sum\limits_{i=1}^{n} \lambda_i x_i^2= \underset{q_i}{\mbox{min }} \sum\limits_i \lambda_i q_i = \underset{q_i}{\mbox{min }} \lambda_n \sum\limits_i q_i = \lambda_n \end{equation}
where \(\lambda_n\) is the smallest eigenvalue of \(B\). The last line follows since \(q_i \geq 0\) and \(\sum\limits_i q_i = 1\), therefore we have a convex combination of a set of numbers \( {\lambda_i}_{i=1}^n \) on the real line. By properties of a convex combination, the result must lie in between the smallest and largest number. Now that we know the minimum is \(\lambda_n\), we can obtain the argmin by the following observation:
If \(v\) is an eigenvector of \(B\), then \begin{equation} Bv = \lambda_n v \end{equation} Left multiplication with \(v^T\) simplifies the right side because \(v^Tv=1\), by our constraint that \(|x|=1\). We find: \begin{equation} v^T(Bv) = v^T (\lambda_n v) = v^Tv \lambda_n = \lambda_n \end{equation} Thus the eigenvector \(v\) associated with the eigenvalue \(\lambda_n\) is \(x^{\star}\). \( \square\)
References:
[1] H. Longuet Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293, 1981.
[2] S. Boyd. Low rank approximation and extremal gain problems, 2008. URL http://ee263.stanford.edu/notes/low_rank_approx.pdf.
[3] Aaron Bobick, online lectures. Video.
[4] Richard I. Hartley. In Defense of the Eight-Point Algorithm. PDF.
[5]. Hartley, Zisserman, 2004. Multiple View Geometry.
[6] Sweeney et al. Optimizing the Viewing Graph for Structure-from-Motion. ICCV 2015. PDF