Context pyramidal network for stereo matching regularized by disparity gradients

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Junhua Kang
  • Lin Chen
  • Fei Deng
  • Christian Heipke

External Research Organisations

  • Wuhan University
View graph of relations

Details

Original languageEnglish
Pages (from-to)201-215
Number of pages15
JournalISPRS Journal of Photogrammetry and Remote Sensing
Volume157
Early online date27 Sept 2019
Publication statusPublished - Nov 2019

Abstract

Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

Keywords

    Dilated convolution, Gradient regularizer, Stereo matching, Structure preserving

ASJC Scopus subject areas

Cite this

Context pyramidal network for stereo matching regularized by disparity gradients. / Kang, Junhua; Chen, Lin; Deng, Fei et al.
In: ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 157, 11.2019, p. 201-215.

Research output: Contribution to journalArticleResearchpeer review

Kang J, Chen L, Deng F, Heipke C. Context pyramidal network for stereo matching regularized by disparity gradients. ISPRS Journal of Photogrammetry and Remote Sensing. 2019 Nov;157:201-215. Epub 2019 Sept 27. doi: 10.1016/j.isprsjprs.2019.09.012
Download
@article{e748a47ed1d74e3497b59aaef71019f5,
title = "Context pyramidal network for stereo matching regularized by disparity gradients",
abstract = "Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.",
keywords = "Dilated convolution, Gradient regularizer, Stereo matching, Structure preserving",
author = "Junhua Kang and Lin Chen and Fei Deng and Christian Heipke",
note = "Funding information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit{\"a}t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit?t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.",
year = "2019",
month = nov,
doi = "10.1016/j.isprsjprs.2019.09.012",
language = "English",
volume = "157",
pages = "201--215",
journal = "ISPRS Journal of Photogrammetry and Remote Sensing",
issn = "0924-2716",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Context pyramidal network for stereo matching regularized by disparity gradients

AU - Kang, Junhua

AU - Chen, Lin

AU - Deng, Fei

AU - Heipke, Christian

N1 - Funding information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit?t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.

PY - 2019/11

Y1 - 2019/11

N2 - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

AB - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

KW - Dilated convolution

KW - Gradient regularizer

KW - Stereo matching

KW - Structure preserving

UR - http://www.scopus.com/inward/record.url?scp=85072713822&partnerID=8YFLogxK

U2 - 10.1016/j.isprsjprs.2019.09.012

DO - 10.1016/j.isprsjprs.2019.09.012

M3 - Article

AN - SCOPUS:85072713822

VL - 157

SP - 201

EP - 215

JO - ISPRS Journal of Photogrammetry and Remote Sensing

JF - ISPRS Journal of Photogrammetry and Remote Sensing

SN - 0924-2716

ER -