Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 201-215 |
Seitenumfang | 15 |
Fachzeitschrift | ISPRS Journal of Photogrammetry and Remote Sensing |
Jahrgang | 157 |
Frühes Online-Datum | 27 Sept. 2019 |
Publikationsstatus | Veröffentlicht - Nov. 2019 |
Abstract
Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.
ASJC Scopus Sachgebiete
- Physik und Astronomie (insg.)
- Atom- und Molekularphysik sowie Optik
- Ingenieurwesen (insg.)
- Ingenieurwesen (sonstige)
- Informatik (insg.)
- Angewandte Informatik
- Erdkunde und Planetologie (insg.)
- Computer in den Geowissenschaften
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: ISPRS Journal of Photogrammetry and Remote Sensing, Jahrgang 157, 11.2019, S. 201-215.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Context pyramidal network for stereo matching regularized by disparity gradients
AU - Kang, Junhua
AU - Chen, Lin
AU - Deng, Fei
AU - Heipke, Christian
N1 - Funding information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit?t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.
PY - 2019/11
Y1 - 2019/11
N2 - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.
AB - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.
KW - Dilated convolution
KW - Gradient regularizer
KW - Stereo matching
KW - Structure preserving
UR - http://www.scopus.com/inward/record.url?scp=85072713822&partnerID=8YFLogxK
U2 - 10.1016/j.isprsjprs.2019.09.012
DO - 10.1016/j.isprsjprs.2019.09.012
M3 - Article
AN - SCOPUS:85072713822
VL - 157
SP - 201
EP - 215
JO - ISPRS Journal of Photogrammetry and Remote Sensing
JF - ISPRS Journal of Photogrammetry and Remote Sensing
SN - 0924-2716
ER -