Fast Nearest Neighbors
Table of Contents:
 The Nearest Neighbor Problem
 Nearest Neighbor Computation
 Brute Force Nearest Neighbors
 Brute Force Compiled w/ Numba JIT
 Vectorization: Speedup via Numpy Broadcasting
 Vectorization: Quadratic Expansion Derivation
 Implementation: Fast Affinity Matrix Computation via Quadratic Expansion
 Speed Comparison
The Nearest Neighbor Problem
Finding closest points in a highdimensional space is a reoccurring problem in computer vision, especially when performing feature matching (e.g. with SIFT [Lowe, 2004]) or computing Chamfer distances [Fan et al.], [Kurenkov et al.] for point set generation with deep networks. It is a cheaper way of finding pointtopoint correspondences than optimal bipartite matching, as the Earth Mover’s Distance requires.
Brute force methods can be prohibitively slow and much faster ways exist of computing with a bit of linear algebra.
Nearest Neighbor Computation
Let \(\mathcal{A,B}\) be sets. We are interested in the finding the nearest neighbor for each point in \(\mathcal{A}\). Let \(a,b \in \mathbb{R}^n\) be two points such that \(a \in \mathcal{A}\), \(b \in \mathcal{B}\). The nearest neighbor in \(\mathcal{B}\) of a point \(a \in \mathcal{A}\) is a point \(b \in \mathcal{B}\), such that \(b = \mbox{arg} \underset{b \in \mathcal{B}}{\mbox{ min}} \ab\_2\). We can equivalently use the squared Euclidean distance \(\ab\_2^2\), since the square function is monotonically increasing for positive values, and distances are always positive. We will see that using the squared Euclidean distance to find the nearest neighbor will spare us some computation later.
The expression \(\ab\_2^2\) is equivalent to an inner product. It is equivalent to the Mahalanobis distance, \((ab)^TA(ab)\), when \(A=I\), when working in Euclidean space (computing \(\ell_2\) norms).
Brute Force Nearest Neighbors
In the brute force regime, we would loop through all points \(a_i \in \mathcal{A}\), and then loop through all points \(b_j \in \mathcal{B}\), and find the distance \(\a_i  b_j\\) with np.linalg.norm(A[i]  B[j])
. This can be done with a double forloop in Python:
def naive_upper_triangular_compute_affinity_matrix(
A: np.ndarray,
B: np.ndarray
) > np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
# Ensure feature vectors have the same length.
if A.shape[1] != B.shape[1]:
raise ValueError("Point sets A and B must have points of identical dimension.")
m1, n = A.shape
m2, n = B.shape
affinity_mat = np.zeros((m1,m2))
for i in range(m1): # rows
for j in range(m2): # cols
diff = A[i]  B[j]
norm = np.linalg.norm(diff)
affinity_mat[i,j] = norm
# Affinity matrix contains the Mahalanobis distances.
return np.square(affinity_mat)
However, this method will be brutally slow for thousands, tens of thousands, or millions of points, which are quite common point cloud sizes in computer vision or robotics. We need a better way.
Brute Force Speedup with Numba JIT
A trivial way to speed up the doublefor loops is to compile the code with Numba JIT.
from numba import jit
@jit(nopython=True)
def naive_upper_triangular_compute_affinity_matrix(
A: np.ndarray,
B: np.ndarray
) > np.ndarray:
...
This will give us some savings, but we can do much better.
Vectorization: Speedup via Numpy Broadcasting
Suppose we reshape \(B\) from B \(\in \mathbb{R}^{m_2, n}\) to B \(\in \mathbb{R}^{m_2, 1, n}\). Then, if we use A \(\in \mathbb{R}^{m_1, n}\) in Numpy, \(A  B\) would have shape \((m_2, m_1, n)\). Taking the norm along axis=2
would yield \((m_2, m_1)\), which can be reshaped to \((m_1, m_2)\):
def compute_affinity_matrix_broadcasting(A: np.ndarray, B: np.ndarray) > np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
_, n = A.shape
m2, _ = B.shape
dists = np.linalg.norm(A  B.reshape(m2,1,n), axis=2)
return np.square(dists.T)
It turns out this is much faster than brute force, but there is another approach that can be yet faster.
Vectorization: Quadratic Expansion Derivation
Now we wish to compute these distances on all pairs of points in entire datasets simultaneously. We can form matrices \(A \in R^{m_1 \times n}, B \in R^{m_2 \times n}\) that hold our highdimensional points.
Let \(a,b\) be vectors, i.e. \(a,b \in \mathbb{R}^n\) are vectors:
\[\begin{aligned} \ab\_2^2 &= (ab)^T(ab) \\ &= (a^Tb^T)(ab) \\ &= a^Ta b^Ta  a^Tb + b^Tb \end{aligned}\]Since \(b^Ta\) and \( a^Tb\) are scalars (inner products), we can swap the order or arguments to find:
\[\begin{aligned} \ab\_2^2 &= a^Ta a^Tb  a^Tb + b^Tb \\ &= a^Ta 2 a^Tb + b^Tb \end{aligned}\]We will see that nearest neighbor computation for all points boils down to only 3 required matrix products: \(AA^T, BB^T, AB^T\).
Our goal is to find \(\a_i  b_j\_2^2 = a_i^Ta_i 2 a_i^Tb_j + b_j^Tb_j\) for all \(i,j\). We wish to build an affinity matrix \(D\) such that the \(D_{ij}\) entry contains the squared distance between \(a_i, b_j\).
Consider a sets \(\mathcal{A,B}\) with 3 points each. We form \(AA^T, BB^T\):
\[AA^T = \begin{bmatrix}  & a_1 &  \\  & a_2 &  \\  & a_3 &  \end{bmatrix} \begin{bmatrix}  &  &  \\ a_1^T & a_2^T & a_3^T \\  &  &  \end{bmatrix} = \begin{bmatrix} a_1^Ta_1 & a_1^T a_2 & a_1^T a_3 \\ a_2^Ta_1 & a_2^T a_2 & a_2^T a_3 \\ a_3^Ta_1 & a_3^T a_2 & a_3^T a_3 \end{bmatrix}\] \[BB^T = \begin{bmatrix}  & b_1 &  \\  & b_2 &  \\  & b_3 &  \end{bmatrix} \begin{bmatrix}  &  &  \\ b_1^T & b_2^T & b_3^T \\  &  &  \end{bmatrix} = \begin{bmatrix} b_1^T b_1 & b_1^T b_2 & b_1^T b_3 \\ b_2^T b_1 & b_2^T b_2 & b_2^T b_3 \\ b_3^T b_1 & b_3^T b_2 & b_3^T b_3 \end{bmatrix}\]We are interested only in the diagonal elements \(a_i^Ta_i\) and \(b_i^Tb_i\). We will define \(T_A\) and \(T_B\) to contain tiled rows, where each row is a diagonal of \(AA^T\) or \(BB^T\), respectively:
\[\begin{array}{ll} T_A = \begin{bmatrix} a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \\ a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \\ a_1^Ta_1 & a_2^Ta_2 & a_3^Ta_3 \end{bmatrix}, & T_B = \begin{bmatrix} b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \end{bmatrix} \end{array}\]We now form \(AB^T\):
\[AB^T = \begin{bmatrix}  & a_1 &  \\  & a_2 &  \\  & a_3 &  \end{bmatrix} \begin{bmatrix}  &  &  \\ b_1^T & b_2^T & b_3^T \\  &  &  \end{bmatrix} = \begin{bmatrix} b_1^Ta_1 & b_2^Ta_1 & b_3^Ta_1 \\ b_1^Ta_2 & b_2^Ta_2 & b_3^Ta_2 \\ b_1^Ta_3 & b_2^Ta_3 & b_3^Ta_3 \end{bmatrix}\]Our desired affinity matrix \(D \in \mathbf{R}^{3 \times 3}\) will contain entries \(D_{ij} = \a_i  b_j\_2^2\):
\[D = \begin{bmatrix} \ a_1  b_1 \_2^2 & \ a_1  b_2 \_2^2 & \a_1  b_3 \_2^2 \\ \ a_2  b_1 \_2^2 & \ a_2  b_2 \_2^2 & \a_2  b_3 \_2^2 \\ \ a_3  b_1 \_2^2 & \ a_3  b_2 \_2^2 & \a_3  b_3 \_2^2 \end{bmatrix}\]In turns out that:
\[\begin{aligned} D &= T_A^T + T_B  2 AB^T \\ D &= \begin{bmatrix} a_1^Ta_1 & a_1^Ta_1 & a_1^Ta_1 \\ a_2^Ta_2 & a_2^Ta_2 & a_2^Ta_2 \\ a_3^Ta_3 & a_3^Ta_3 & a_3^Ta_3 \end{bmatrix} + \begin{bmatrix} b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \\ b_1^T b_1 & b_2^T b_2 & b_3^T b_3 \end{bmatrix}  2 \begin{bmatrix} b_1^Ta_1 & b_2^Ta_1 & b_3^Ta_1 \\ b_1^Ta_2 & b_2^Ta_2 & b_3^Ta_2 \\ b_1^Ta_3 & b_2^Ta_3 & b_3^Ta_3 \end{bmatrix} \end{aligned}\]Since as you can see above, \(D_{ij} = \a_i  b_j\_2^2 = a_i^Ta_i 2 a_i^Tb_j + b_j^Tb_j\) for all \(i,j\).
Implementation: Fast Affinity Matrix Computation via Quadratic Expansion
The implementation requires just a few lines in Numpy:
import numpy as np
def fast_affinity_mat_compute(A: np.ndarray, B: np.ndarray) > np.ndarray:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n).
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
# Ensure feature vectors have the same length.
if A.shape[1] != B.shape[1]:
raise ValueError("Point sets A and B must have points of identical dimension.")
m1, n = A.shape
m2, _ = B.shape
ab_T = np.matmul(A, B.T )
a_sqr = np.diag(A.dot(A.T))
b_sqr = np.diag(B.dot(B.T))
a_sqr = np.tile( a_sqr, (m2,1) ).T
b_sqr = np.tile( b_sqr, (m1,1) )
return a_sqr + b_sqr  2 * ab_T
Speed Comparison
We now demonstrate the runtime of the three approaches approach. We’ll use the following snippet:
import time
def unit_test_arr_arr(m1: int, m2: int, n: int) > None:
""" """
A = np.random.rand(m1,n) # Point set 1.
B = np.random.rand(m2,n) # Point set 2.
start = time.time()
gt_aff = naive_upper_triangular_compute_affinity_matrix(A, B)
end = time.time()
duration = end  start
print(f"Duration: {duration:.3f}")
start = time.time()
pred_aff2 = compute_affinity_matrix_broadcasting(A, B)
end = time.time()
duration = end  start
print(f"Duration: {duration:.3f}")
start = time.time()
pred_aff = fast_affinity_mat_compute(A, B)
end = time.time()
duration = end  start
print(f"Duration: {duration:.3f}")
print("Sum of abs differences: ", np.absolute(gt_aff  pred_aff).sum())
assert np.allclose(gt_aff, pred_aff, atol=1e2)
print("Sum of abs differences: ", np.absolute(gt_aff  pred_aff2).sum())
assert np.allclose(gt_aff, pred_aff2, atol=1e2)
if __name__ == '__main__':
m1 = 1000
m2 = 1350
n = 128
np.random.seed(2)
unit_test_arr_arr(m1, m2, n)
Now consider the speedup we’ve achieved when we measure runtimes:
\(m_1, m_2, n\)  Brute Force  Brute Force w/ JIT 
Broadcasting  Vectorized Quad. Expansion 

10k, 1350, 3  60.2 s.  6.6 s.  1.1 s.  1.3 s. 
1000, 1350, 128  7.2 s.  2.5  2.0 s.  0.07 s. 
5000, 5000, 128  101.6 s.  17.5  OOM  1.3 s. 
Using a GPU
We’ll now implement both vectorized methods for GPU using Pytorch, and run another speed comparison.
Welldiscussed topic in the Pytorch forums.
import torch
def pairwise_distances(
A: torch.Tensor,
B: Optional[torch.Tensor] = None
) > torch.Tensor:
"""Compute squared Euclidean distances between two point sets `A` and `B`.
Args:
A: Array of shape (m1, n).
B: Array of shape (m2, n). If none, selfdistances w/ `A` are computed.
Returns:
Array of shape (m1,m2) representing squared Euclidean distances (not necessarily symmetric).
"""
A_norm = (A**2).sum(1).view(1, 1)
if B is not None:
B_t = torch.transpose(B, 0, 1)
B_norm = (B**2).sum(1).view(1, 1)
else:
# Selfdistances.
B_t = torch.transpose(A, 0, 1)
B_norm = A_norm.view(1, 1)
dist = A_norm + B_norm  2.0 * torch.mm(A, B_t)
# Ensure diagonal is zero if A=B
# if B is None:
# dist = dist  torch.diag(dist.diag)
return torch.clamp(dist, 0.0, np.inf)
See openUCN
References

David Lowe. Distinctive Image Features from ScaleInvariant Keypoints. IJCV, 2004. [PDF].

H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction from a single image. CVPR 2017. [PDF].

A. Kurenkov, J. Ji, A. Garg, V. Mehta, J. Gwak, C. B. Choy, and S. Savarese. Deformnet: Freeform deformation network for 3d shape reconstruction from a single image. WACV 2018. [PDF].