Fast Nearest Neighbors
Table of Contents:
- The Nearest Neighbor Problem
- Nearest Neighbor Computation
- Brute Force Nearest Neighbors
- Brute Force Compiled w/ Numba JIT
- Vectorization: Speedup via Numpy Broadcasting
- Vectorization: Quadratic Expansion Derivation
- Implementation: Fast Affinity Matrix Computation via Quadratic Expansion
- Speed Comparison
The Nearest Neighbor Problem
Finding closest points in a high-dimensional space is a re-occurring problem in computer vision, especially when performing feature matching (e.g. with SIFT [Lowe, 2004]) or computing Chamfer distances [Fan et al.], [Kurenkov et al.] for point set generation with deep networks. It is a cheaper way of finding point-to-point correspondences than optimal bipartite matching, as the Earth Mover’s Distance requires.
Brute force methods can be prohibitively slow and much faster ways exist of computing with a bit of linear algebra.
Nearest Neighbor Computation
Let \(\mathcal{A,B}\) be sets. We are interested in the finding the nearest neighbor for each point in \(\mathcal{A}\). Let \(a,b \in \mathbb{R}^n\) be two points such that \(a \in \mathcal{A}\), \(b \in \mathcal{B}\). The nearest neighbor in \(\mathcal{B}\) of a point \(a \in \mathcal{A}\) is a point \(b \in \mathcal{B}\), such that \(b = \mbox{arg} \underset{b \in \mathcal{B}}{\mbox{ min}} \|a-b\|_2\). We can equivalently use the squared Euclidean distance \(\|a-b\|_2^2\), since the square function is monotonically increasing for positive values, and distances are always positive. We will see that using the squared Euclidean distance to find the nearest neighbor will spare us some computation later.
The expression \(\|a-b\|_2^2\) is equivalent to an inner product. It is equivalent to the Mahalanobis distance, \((a-b)^TA(a-b)\), when \(A=I\), when working in Euclidean space (computing \(\ell_2\) norms).
Brute Force Nearest Neighbors
In the brute force regime, we would loop through all points \(a_i \in \mathcal{A}\), and then loop through all points \(b_j \in \mathcal{B}\), and find the distance \(\|a_i - b_j\|\) with np.linalg.norm(A[i] - B[j])
. This can be done with a double for-loop in Python:
def naive_upper_triangular_compute_affinity_matrix(
A: np.ndarray,
B: np.ndarray
) -> np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
# Ensure feature vectors have the same length.
if A.shape[1] != B.shape[1]:
raise ValueError("Point sets A and B must have points of identical dimension.")
m1, n = A.shape
m2, n = B.shape
affinity_mat = np.zeros((m1,m2))
for i in range(m1): # rows
for j in range(m2): # cols
diff = A[i] - B[j]
norm = np.linalg.norm(diff)
affinity_mat[i,j] = norm
# Affinity matrix contains the Mahalanobis distances.
return np.square(affinity_mat)
However, this method will be brutally slow for thousands, tens of thousands, or millions of points, which are quite common point cloud sizes in computer vision or robotics. We need a better way.
Brute Force Speedup with Numba JIT
A trivial way to speed up the double-for loops is to compile the code with Numba JIT.
from numba import jit
@jit(nopython=True)
def naive_upper_triangular_compute_affinity_matrix(
A: np.ndarray,
B: np.ndarray
) -> np.ndarray:
...
This will give us some savings, but we can do much better.
Vectorization: Speedup via Numpy Broadcasting
Suppose we reshape \(B\) from B \(\in \mathbb{R}^{m_2, n}\) to B \(\in \mathbb{R}^{m_2, 1, n}\). Then, if we use A \(\in \mathbb{R}^{m_1, n}\) in Numpy, \(A - B\) would have shape \((m_2, m_1, n)\). Taking the norm along axis=2
would yield \((m_2, m_1)\), which can be reshaped to \((m_1, m_2)\):
def compute_affinity_matrix_broadcasting(A: np.ndarray, B: np.ndarray) -> np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
_, n = A.shape
m2, _ = B.shape
dists = np.linalg.norm(A - B.reshape(m2,1,n), axis=2)
return np.square(dists.T)
It turns out this is much faster than brute force, but there is another approach that can be yet faster.
Vectorization: Quadratic Expansion Derivation
Now we wish to compute these distances on all pairs of points in entire datasets simultaneously. We can form matrices \(A \in R^{m_1 \times n}, B \in R^{m_2 \times n}\) that hold our high-dimensional points.
Let \(a,b\) be vectors, i.e. \(a,b \in \mathbb{R}^n\) are vectors:
\[\begin{aligned} \|a-b\|_2^2 &= (a-b)^T(a-b) \\ &= (a^T-b^T)(a-b) \\ &= a^Ta -b^Ta - a^Tb + b^Tb \end{aligned}\]Since \(-b^Ta\) and \(- a^Tb\) are scalars (inner products), we can swap the order or arguments to find:
\[\begin{aligned} \|a-b\|_2^2 &= a^Ta -a^Tb - a^Tb + b^Tb \\ &= a^Ta -2 a^Tb + b^Tb \end{aligned}\]We will see that nearest neighbor computation for all points boils down to only 3 required matrix products: \(AA^T, BB^T, AB^T\).
Our goal is to find \(\|a_i - b_j\|_2^2 = a_i^Ta_i -2 a_i^Tb_j + b_j^Tb_j\) for all \(i,j\). We wish to build an affinity matrix \(D\) such that the \(D_{ij}\) entry contains the squared distance between \(a_i, b_j\).
Consider a sets \(\mathcal{A,B}\) with 3 points each. We form \(AA^T, BB^T\):
\[AA^T = \begin{bmatrix} - & a_1 & - \\ - & a_2 & - \\ - & a_3 & - \end{bmatrix} \begin{bmatrix} | & | & | \\ a_1^T & a_2^T & a_3^T \\ | & | & | \end{bmatrix} = \begin{bmatrix} a_1^Ta_1 & a_1^T a_2 & a_1^T a_3 \\ a_2^Ta_1 & a_2^T a_2 & a_2^T a_3 \\ a_3^Ta_1 & a_3^T a_2 & a_3^T a_3 \end{bmatrix}\] \[BB^T = \begin{bmatrix} - & b_1 & - \\ - & b_2 & - \\ - & b_3 & - \end{bmatrix} \begin{bmatrix} | & | & | \\ b_1^T & b_2^T & b_3^T \\ | & | & | \end{bmatrix} = \begin{bmatrix} b_1^T b_1 & b_1^T b_2 & b_1^T b_3 \\ b_2^T b_1 & b_2^T b_2 & b_2^T b_3 \\ b_3^T b_1 & b_3^T b_2 & b_3^T b_3 \end{bmatrix}\]We are interested only in the diagonal elements \(a_i^Ta_i\) and \(b_i^Tb_i\). We will define \(T_A\) and \(T_B\) to contain tiled rows, where each row is a diagonal of \(AA^T\) or \(BB^T\), respectively:
\[\begin{array}{ll} T_A = \begin{bmatrix} a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \\ a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \\ a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \end{bmatrix}, & T_B = \begin{bmatrix} b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \end{bmatrix} \end{array}\]We now form \(AB^T\):
\[AB^T = \begin{bmatrix} - & a_1 & - \\ - & a_2 & - \\ - & a_3 & - \end{bmatrix} \begin{bmatrix} | & | & | \\ b_1^T & b_2^T & b_3^T \\ | & | & | \end{bmatrix} = \begin{bmatrix} b_1^Ta_1 & b_2^Ta_1 & b_3^Ta_1 \\ b_1^Ta_2 & b_2^Ta_2 & b_3^Ta_2 \\ b_1^Ta_3 & b_2^Ta_3 & b_3^Ta_3 \end{bmatrix}\]Our desired affinity matrix \(D \in \mathbf{R}^{3 \times 3}\) will contain entries \(D_{ij} = \|a_i - b_j\|_2^2\):
\[D = \begin{bmatrix} \| a_1 - b_1 \|_2^2 & \| a_1 - b_2 \|_2^2 & \|a_1 - b_3 \|_2^2 \\ \| a_2 - b_1 \|_2^2 & \| a_2 - b_2 \|_2^2 & \|a_2 - b_3 \|_2^2 \\ \| a_3 - b_1 \|_2^2 & \| a_3 - b_2 \|_2^2 & \|a_3 - b_3 \|_2^2 \end{bmatrix}\]In turns out that:
\[\begin{aligned} D &= T_A^T + T_B - 2 AB^T \\ D &= \begin{bmatrix} a_1^Ta_1 & a_1^Ta_1 & a_1^Ta_1 \\ a_2^Ta_2 & a_2^Ta_2 & a_2^Ta_2 \\ a_3^Ta_3 & a_3^Ta_3 & a_3^Ta_3 \end{bmatrix} + \begin{bmatrix} b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \end{bmatrix} - 2 \begin{bmatrix} b_1^Ta_1 & b_2^Ta_1 & b_3^Ta_1 \\ b_1^Ta_2 & b_2^Ta_2 & b_3^Ta_2 \\ b_1^Ta_3 & b_2^Ta_3 & b_3^Ta_3 \end{bmatrix} \end{aligned}\]Since as you can see above, \(D_{ij} = \|a_i - b_j\|_2^2 = a_i^Ta_i -2 a_i^Tb_j + b_j^Tb_j\) for all \(i,j\).
Implementation: Fast Affinity Matrix Computation via Quadratic Expansion
The implementation requires just a few lines in Numpy:
import numpy as np
def fast_affinity_mat_compute(A: np.ndarray, B: np.ndarray) -> np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
# Ensure feature vectors have the same length.
if A.shape[1] != B.shape[1]:
raise ValueError("Point sets A and B must have points of identical dimension.")
m1, n = A.shape
m2, _ = B.shape
ab_T = np.matmul(A, B.T )
a_sqr = np.diag(A.dot(A.T))
b_sqr = np.diag(B.dot(B.T))
a_sqr = np.tile( a_sqr, (m2,1) ).T
b_sqr = np.tile( b_sqr, (m1,1) )
return a_sqr + b_sqr - 2 * ab_T
Speed Comparison
We now demonstrate the runtime of the three approaches approach. We’ll use the following snippet:
import time
def unit_test_arr_arr(m1: int, m2: int, n: int) -> None:
""" """
A = np.random.rand(m1,n) # Point set 1.
B = np.random.rand(m2,n) # Point set 2.
start = time.time()
gt_aff = naive_upper_triangular_compute_affinity_matrix(A, B)
end = time.time()
duration = end - start
print(f"Duration: {duration:.3f}")
start = time.time()
pred_aff2 = compute_affinity_matrix_broadcasting(A, B)
end = time.time()
duration = end - start
print(f"Duration: {duration:.3f}")
start = time.time()
pred_aff = fast_affinity_mat_compute(A, B)
end = time.time()
duration = end - start
print(f"Duration: {duration:.3f}")
print("Sum of abs differences: ", np.absolute(gt_aff - pred_aff).sum())
assert np.allclose(gt_aff, pred_aff, atol=1e-2)
print("Sum of abs differences: ", np.absolute(gt_aff - pred_aff2).sum())
assert np.allclose(gt_aff, pred_aff2, atol=1e-2)
if __name__ == '__main__':
m1 = 1000
m2 = 1350
n = 128
np.random.seed(2)
unit_test_arr_arr(m1, m2, n)
Now consider the speedup we’ve achieved when we measure runtimes:
\(m_1, m_2, n\) | Brute Force | Brute Force w/ JIT |
Broadcasting | Vectorized Quad. Expansion |
---|---|---|---|---|
10k, 1350, 3 | 60.2 s. | 6.6 s. | 1.1 s. | 1.3 s. |
1000, 1350, 128 | 7.2 s. | 2.5 | 2.0 s. | 0.07 s. |
5000, 5000, 128 | 101.6 s. | 17.5 | OOM | 1.3 s. |
Using a GPU
We’ll now implement both vectorized methods for GPU using Pytorch, and run another speed comparison.
Well-discussed topic in the Pytorch forums.
import torch
def pairwise_distances(
A: torch.Tensor,
B: Optional[torch.Tensor] = None
) -> torch.Tensor:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n). If none, self-distances w/ `A` are computed.
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
A_norm = (A**2).sum(1).view(-1, 1)
if B is not None:
B_t = torch.transpose(B, 0, 1)
B_norm = (B**2).sum(1).view(1, -1)
else:
# Self-distances.
B_t = torch.transpose(A, 0, 1)
B_norm = A_norm.view(1, -1)
dist = A_norm + B_norm - 2.0 * torch.mm(A, B_t)
# Ensure diagonal is zero if A=B
# if B is None:
# dist = dist - torch.diag(dist.diag)
return torch.clamp(dist, 0.0, np.inf)
See open-UCN
References
-
David Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 2004. [PDF].
-
H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction from a single image. CVPR 2017. [PDF].
-
A. Kurenkov, J. Ji, A. Garg, V. Mehta, J. Gwak, C. B. Choy, and S. Savarese. Deformnet: Free-form deformation network for 3d shape reconstruction from a single image. WACV 2018. [PDF].