Table of Contents:

The 3D Registration Problem

The 3D Registration Problem is defined as follows: given two shapes \(A\) and \(B\) which partially overlap, use only rigid transforms to register \(B\) against \(A\) by minimizing a measure of distance \(\delta\) between A and B. We assume \(A\) and \(B\) are positioned close to each other. The optimization problem can be phrased as:

\[\min\limits_T \delta(A, T(B))\]

Degrees of Freedom: Transform estimation

A rigid motion has 6 degrees of freedom (3 for translation and 3 for rotation). We typically estimate the motion using many more pairs of corresponding points, so the problem is overdetermined (which is good, given noise, outliers, etc – use least squares approaches).

Key Challenges of the Hard Optimization Problem

  1. We must estimate correspondences. This gives rise to combinatorial searches.
  2. We must estimate the aligning transform. Transforms may be non-linear.

Fortunately, the low-dimensionality of some transforms helps.

Optimal Transformation for Point Clouds: Derivation

When given correspondences, the problem is formulated as:

Given two sets points: \(\{a_i \}_{i=1}^n, \{ b_i \}_{i=1}^n\) in . Find the rigid transform \(\mathbf{R} \in SO(3), t \in R^3\) that minimizes

\[\underset{\mathbf{R}, t}{\mbox{minimize }} \sum\limits_{i=1}^N \| \mathbf{R}x_i + t - y_i \|_2^2\] \[\sum\limits_{i=1}^n \|Ra_i − t − b_i \|_2^2\]

We can define the centroid of A as:

\[\bar{a} = \frac{1}{|A|} \sum\limits_i a_i\]

Similarly for B,

\[\bar{b} = \frac{1}{|B|} \sum\limits_i b_i\]

Defining the centroids as above will help simplify the minimization problem. Let us now define the following:

\[a_i^{\prime} = a_i − \bar{a}\] \[b_i^{\prime} = b_i − \bar{b}\]

We can re-write our original points as:

\[a_i = a_i^{\prime} + \bar{a}, \hspace{10mm} b_i = b_i^{\prime} + \bar{b}\]

Plugging these back into our original objective function:

\[= \sum\limits_{i=1}^n \|R(a_i^{\prime} + \bar{a}) − t − (b_i^{\prime} + \bar{b})\|_2^2\] \[= \sum\limits_{i=1}^n \|Ra_i^{\prime} − b_i^{\prime} + (R\bar{a} − \bar{b} − t)\|_2^2\]

Let

\[t = R\bar{a} − \bar{b}\]

We’ll see that our original optimization problem over 2 variables (\(\mathbf{R},t\)) on \(\Big\{(a_i,b_i)\Big\}_{i=1}^n\) becomes an optimization problem over just a single variable \(\mathbf{R}\) on \(\Big\{(a_i^\prime,b_i^\prime)\Big\}_{i=1}^n\), which is far more tractable and easier to deal with in closed-form.

\[\sum\limits_{i=1}^n \|Ra_i − t − b_i \|_2^2 = \sum\limits_{i=1}^n \|Ra_i^{\prime} − b_i^{\prime}\|_2^2\] \[tr(RN)\]
from typing import Tuple
import numpy as np
def find_rigid_alignment(A: np.ndarray, B: np.ndarray) -> Tuple[np.ndarray,np.ndarray]:
	"""
	2-D or 3-D registration with known correspondences.
	Registration occurs in the zero centered coordinate system, and then
	must be transported back.

	Args:
	    A: Array of shape (N,D) -- Reference Point Cloud (target)
	    B: Array of shape (N,D) -- Point Cloud to Align (source)

	Returns:
	    R: optimal rotation (3,3)
	    t: optimal translation (3,)
	"""
	num_pts = A.shape[0]
	dim = A.shape[1]

	a_mean = np.mean(A, axis=0)
	b_mean = np.mean(B, axis=0)

	# Zero-center the point clouds
	A -= a_mean
	B -= b_mean

	N = np.zeros((dim, dim))
	for i in range(num_pts):
		N += A[i].reshape(dim,1) @ B[i].reshape(1,dim)
	N = A.T @ B

	U, D, V_T = np.linalg.svd(N)
	S = np.eye(dim)
	det = np.linalg.det(U) * np.linalg.det(V_T.T)
	
	# Check for reflection case
	if not np.isclose(det,1.):
		S[dim-1,dim-1] = -1

	R = U @ S @ V_T
	t = R @ b_mean.reshape(dim,1) - a_mean.reshape(dim,1)
	return R, -t.squeeze()

Similar code named the Umeyama Transform after[4] ships with Eigen as Umeyama. The Open3D library utilizes the Umeyama method also (source code here).

Roland Siegwart’s group at ETH Zurich has an efficient open-source C++ ICP implementation named libpointmatcher.

Local Methods

Iterated Closest Pair (ICP) [3]

  • Align the \(A\) points to their closest \(B\) neighbors, then repeat.

  • Converges, if starting positions are “close enough”.

Variants

Below we discuss two of many ICP variants: Exhaustive-Search ICP and Generalized ICP. A discussion of more variants can be found in [7].

As mentioned above, ICP relies upon a strong assumption: the scans (point clouds) \(A\) and \(B\) are positioned close to each other, i.e. we have a good initial alignment estimate. When it is clear that this assumption does not hold, one solution is to use more computation and to sample the space of possible initial alignments. Here is a short summary of several strengths and weaknesses of the Exhaustive Search method:

  • Compare (ideally) all alignments
  • Correspondence is determined by the alignment at which models are closest.
  • Provides optimal result
  • Can be unnecessarily slow
  • Does not generalize well to non-rigid deformations

Chen et al. describe more details of a RANSAC-based DARCES (data-aligned rigidity-constrained exhaustive search) method in [6]. Their goal is to check all possible data alignments of two given 3D data sets in an efficient way. They employ RANSAC to ensure that the model fitting is not influenced my outliers (robust estimation).

Generalized ICP

Segal et al. [5] introduce a method called Generalized ICP

Point-Cloud Registration with Scale Estimation

At times, it may be desirable to align not just two point clouds, but instead to align two pose graphs, perhaps when evaluating an SFM result with a ground truth model. However, there may be a gauge ambiguity (unknown scale) between the two models. Instead of finding the best SE(3) transformation, we’ll need to compute the best Sim(3) transformation, i.e. we must also estimate a scale factor \(s\) (see here) [8].

\[\underset{\mathbf{R}, t, s}{\mbox{minimize }} \sum\limits_{i=1}^N \| b_i - s\mathbf{R}a_i - t \|_2^2\]

Perhaps more accurately, we can write the “to” and “from” indices on our rotations and translations:

\[\underset{\mathbf{R}, t, s}{\mbox{minimize }} \sum\limits_{i=1}^N \| b_i - s({}^b\mathbf{R}_aa_i) - {}^bt_a \|_2^2\]

In this scenario, we can estimate the relative rotation \({}^b \mathbf{R}_a\) first, as the Karcher mean between noisy \({}^b \mathbf{R}_a\) estimates from each pose-pair.

Now, assuming that \({}^b \mathbf{R}_a\) has been computed, then the scale factor can become the only unknown in the minimization problem above via a special trick below. We are now working with the optimization problem below:

\[\begin{aligned} \underset{t, s}{\mbox{minimize }} & \sum\limits_{i=1}^N \| b_i - s({}^b\mathbf{R}_aa_i) - {}^bt_a \|_2^2 \\ \end{aligned}\]

Suppose we compute centroids \(\bar{a}\) and \(\bar{b}\) as we did before. Suppose we set \(t = \bar{b} - s R \bar{a}\). Then

\[\begin{aligned} \underset{t, s}{\mbox{minimize }} & \sum\limits_{i=1}^N \| b_i - s(R^\star a_i) - (\bar{b} - s R^\star \bar{a}) \|_2^2 \\ & \sum\limits_{i=1}^N \| b_i - s(R^\star a_i) - \bar{b} + s R^\star \bar{a} \|_2^2 \\ & \sum\limits_{i=1}^N \| b_i - \bar{b} - s(R^\star a_i) + s R^\star \bar{a} \|_2^2 \\ & \sum\limits_{i=1}^N \| b_i - \bar{b} - s(R^\star a_i) - s R^\star (-\bar{a}) \|_2^2 \\ & \sum\limits_{i=1}^N \| (b_i - \bar{b}) - sR^\star (a_i - \bar{a}) \|_2^2 \\ \end{aligned}\]

We’ll define new variables Using \(\tilde{a_i} = R^\star(a_i - \bar{a})\) and \(\tilde{b_i} = b_i - \bar{b}\) as we used earlier, then we get a optimization problem over a single unknown:

\[\begin{aligned} \underset{s}{\mbox{minimize }} & \sum\limits_{i=1}^N \| \tilde{b_i} - s\tilde{a_i}\|_2^2 \\ & \sum\limits_{i=1}^N \Big(\tilde{b_i} - s \tilde{a_i} \Big)^T \Big(\tilde{b_i} - s\tilde{a_i} \Big) \\ & \sum\limits_{i=1}^N \Big( \tilde{b_i}^T - s \tilde{a_i}^T \Big) \Big(\tilde{b_i} - s\tilde{a_i} \Big) \\ & \sum\limits_{i=1}^N \tilde{b_i}^T\tilde{b_i} - \sum\limits_{i=1}^N 2 s \tilde{a_i}^T \tilde{b_i} + \sum\limits_{i=1}^N s^2 \tilde{a_i}^T \tilde{a_i} \\ \end{aligned}\]

Now, we can set the gradient equal to zero, and optimize:

\[\begin{aligned} 0 &= \nabla_s \Bigg( \sum\limits_{i=1}^N \tilde{b_i}^T\tilde{b_i} - \sum\limits_{i=1}^N 2 s \tilde{a_i}^T \tilde{b_i} + \sum\limits_{i=1}^N s^2 \tilde{a_i}^T \tilde{a_i} \Bigg) \\ 0 &= \nabla_s \Bigg(\sum\limits_{i=1}^N 2 s \tilde{a_i}^T \tilde{b_i}\Bigg) + \nabla_s \Bigg( -\sum\limits_{i=1}^N s^2 \tilde{a_i}^T \tilde{a_i} \Bigg) \\ 0 &= \nabla_s \Bigg(s\sum\limits_{i=1}^N 2 \tilde{a_i}^T \tilde{b_i}\Bigg) + \nabla_s \Bigg( -s^2 \sum\limits_{i=1}^N \tilde{a_i}^T \tilde{a_i} \Bigg) \\ 0 &= \sum\limits_{i=1}^N 2 \tilde{a_i}^T \tilde{b_i} - 2s \sum\limits_{i=1}^N \tilde{a_i}^T \tilde{a_i} \\ 2s \sum\limits_{i=1}^N \tilde{a_i}^T \tilde{a_i} &= \sum\limits_{i=1}^N 2 \tilde{a_i}^T \tilde{b_i} \\ s^\star = \frac{\sum\limits_{i=1}^N 2 \tilde{a_i}^T \tilde{b_i}}{2\sum\limits_{i=1}^N \tilde{a_i}^T \tilde{a_i}} \end{aligned}\]

Toy Example and Implementation of Optimization over Sim(3)

We’ll now try out this method on a real-world example. Suppose we reconstructed a scene with 3 poses along a line. The ground truth also has 3 poses along a line, but at a different scale (2x), and shifted/rotated.

import numpy as np
from gtsam import Rot3, Pose3

# 90 degrees yaw
Rz90 = Rot3.Rz(np.deg2rad(90))

# Create source poses (three objects o1, o2, o3 living in the egovehicle "e" frame)
# Suppose they are 3d cuboids detected by an onboard sensor in the egovehicle frame
eTo0 = Pose3(Rot3(), np.array([1, 0, 0]))
eTo1 = Pose3(Rot3(), np.array([2, 0, 0]))
eTo2 = Pose3(Rot3(), np.array([4, 0, 0]))

eToi_list = [eTo0, eTo1, eTo2]

# Create destination poses
# (same three objects, but instead living in the world/city "w" frame)
wTo0 = Pose3(Rz90, np.array([0, 12, 0]))
wTo1 = Pose3(Rz90, np.array([0, 14, 0]))
wTo2 = Pose3(Rz90, np.array([0, 18, 0]))

wToi_list = [wTo0, wTo1, wTo2]

In GTSAM, the scale-integrated registration algorithm described above is implemented as Align():

we_pairs = Pose3Pairs(list(zip(wToi_list, eToi_list)))

# Recover the transformation wSe (i.e. world_S_egovehicle)
wSe = Similarity3.Align(we_pairs)

for wToi, eToi in zip(wToi_list, eToi_list):
    self.gtsamAssertEquals(wToi, wSe.transformFrom(eToi))

We can write a simple implementation ourselves, in Numpy. We’ll centralize the points:

A = np.array([ wToi.translation() for wToi in wToi_list ])
B = np.array([ eToi.translation() for eToi in eToi_list ])

a_centroid = A.mean(axis=0)
b_centroid = B.mean(axis=0)

# a_i's are the rows
A -= a_centroid
B -= b_centroid

Suppose we compute \(R\) as the Karcher mean of aRb terms. We’ll now estimate the scale s:

y = 0
x = 0

for i in range(len(wToi_list)):
    da = A[i] ## centralized point
    db = B[i]

    da_prime = wSe.rotation().matrix() @ db
    y += da.T @ da_prime
    x += da_prime.T @ da_prime

s = y / x

aTb = (a_centroid - s * ( wSe.rotation().matrix() @ b_centroid)) / s

Note that to construct the Sim(3) object, we divide t by s. This is required because the registration cost function minimizes \(||a - (sRb + t)||\), whereas Sim(3) computes \(s(Rb + t)\), i.e. t is multiplied by the scale again, so we must divide it here to offset this.

References

[1] Leonidas Guibas. Alignments and Correspondences, Geometric Features. CS 233: Geometric and Topological Data Analysis, 9 May 2018.

[2] Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns”, PAMI 1991. PDF.

[3] Paul Besl and Neil McKay. A Method for Registration of 3-D Shapes. TPAMI, 1992. PDF

[4] Jeff Phillips. Lecture 24: Iterative Closest Point and Earth Mover’s Distance. CPS296.2 Geometric Optimization. 10 April 2007. PDF.

[5] A. Segal, D. Haehnel, and S. Thrun. Generalized-ICP. In Robotics: Science and Systems (RSS), volume 2, page 435, 2009. PDF.

[6] Chen, C., Hung, Y., and Cheng, J. “RANSAC-Based DARCES: A New Approach to Fast Automatic Registration of Partially Overlapping Range Images,” Trans. PAMI, Vol. 21, No. 11, 1999.

[7]. Szymon Rusinkiewicz and Marc Levoy. Efficient Variants of the ICP Algorithm. PDF.

[8]. Timo Zinsser, Jochen Schimdt, Heinrich Niemann. Point Set Registration with Integrated Scale Estimation. International Conference on Pattern Recognition and Image Processing (PRIP 2005), pages 116-119. PDF.