Multi-View Stereo

Assume we know the transformation from the reference camera to the physical cameras.

Plane-Sweeping Stereo

Introduced by Collins [1]

Choose a reference view
Sweep family of planes at different depths with respect to the reference camera

Each plane defines a homography warping each input image into the reference view. We create a cost volume over possible disparities (corresponding to different possible depths), and keep depths with low SSD error in all other views (or any photoconsistency measure)

Materials by Deva Ramanan at CMU PDF, Linda Shapiro at UW PDF, Lana Lazebnik at UNC PDF, Dan Huttenlocher at Cornell PDF,

Given two cameras \(P = K [ I | 0]\) and \(P^\prime = K^\prime [ R | t]\) and a plane \(\pi = (n^T,d)^T\)

The homography \(x^\prime = Hx\) is defined as \(H = K^\prime (R - tn^T/d) K^{-1}\)

Yao et al in MVSNet: \(H_i(d) = K_i \cdot R_i \cdot \Bigg( I - \frac{(t_1 - t_i) \cdot n_1^T}{d} \Bigg) \cdot R_1^T \cdot K_1^T\)

Patchmatch Stereo

Randomized correspondence search.

MVSNet

Naive Fusion: Backproject each Depth Map to World Points

Since

\[\begin{bmatrix}u \cdot d \\ v \cdot d \\ d \end{bmatrix} = K_{ref} * p_c\] \[p_c = K_{ref}^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \cdot d\]

where \(d\) is the depth.

Get the point coordinates inside the world frame:

\[p_w = {}^wT_c * p_c = {}^wT_c * \begin{bmatrix} x \\ y \\ z \end{bmatrix}\]

P-MVSNet [3], Fast MVSNet [4], D2HC-RMVSNet [5]

NERF

MVSNerf

Plenoxels

References

Robert Collins. A Space-Sweep Approach to True Multi-Image Matching. CVPR, 1996. PDF.
Yao Yao et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo. ECCV 2018. PDF.
Luo. P-MVSNet. ICCV 2019 PDF.
Yu. Fast-MVSNet. CVPR 2020. PDF.
Yan. D2HC-RMVSNet. ECCV 2020. PDF.
Carl Olsson. Computer Vision: Lecture 11. 2019-02-26. PDF