Multi-View Stereo
Multi-View Stereo
Assume we know the transformation from the reference camera to the physical cameras.
Plane-Sweeping Stereo
Introduced by Collins [1]
- Choose a reference view
- Sweep family of planes at different depths with respect to the reference camera
Each plane defines a homography warping each input image into the reference view. We create a cost volume over possible disparities (corresponding to different possible depths), and keep depths with low SSD error in all other views (or any photoconsistency measure)
Materials by Deva Ramanan at CMU PDF, Linda Shapiro at UW PDF, Lana Lazebnik at UNC PDF, Dan Huttenlocher at Cornell PDF,
Given two cameras \(P = K [ I | 0]\) and \(P^\prime = K^\prime [ R | t]\) and a plane \(\pi = (n^T,d)^T\)
The homography \(x^\prime = Hx\) is defined as \(H = K^\prime (R - tn^T/d) K^{-1}\)
Yao et al in MVSNet: \(H_i(d) = K_i \cdot R_i \cdot \Bigg( I - \frac{(t_1 - t_i) \cdot n_1^T}{d} \Bigg) \cdot R_1^T \cdot K_1^T\)
Patchmatch Stereo
Randomized correspondence search.
MVSNet
Naive Fusion: Backproject each Depth Map to World Points
Since
\[\begin{bmatrix}u \cdot d \\ v \cdot d \\ d \end{bmatrix} = K_{ref} * p_c\] \[p_c = K_{ref}^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \cdot d\]where \(d\) is the depth.
Get the point coordinates inside the world frame:
\[p_w = {}^wT_c * p_c = {}^wT_c * \begin{bmatrix} x \\ y \\ z \end{bmatrix}\]P-MVSNet [3], Fast MVSNet [4], D2HC-RMVSNet [5]
NERF
MVSNerf
Plenoxels
References
- Robert Collins. A Space-Sweep Approach to True Multi-Image Matching. CVPR, 1996. PDF.
- Yao Yao et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo. ECCV 2018. PDF.
- Luo. P-MVSNet. ICCV 2019 PDF.
- Yu. Fast-MVSNet. CVPR 2020. PDF.
-
Yan. D2HC-RMVSNet. ECCV 2020. PDF.
- Carl Olsson. Computer Vision: Lecture 11. 2019-02-26. PDF