Context pyramidal network for stereo matching regularized by disparity gradients

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Junhua Kang
  • Lin Chen
  • Fei Deng
  • Christian Heipke

Externe Organisationen

  • Wuhan University
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)201-215
Seitenumfang15
FachzeitschriftISPRS Journal of Photogrammetry and Remote Sensing
Jahrgang157
Frühes Online-Datum27 Sept. 2019
PublikationsstatusVeröffentlicht - Nov. 2019

Abstract

Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

ASJC Scopus Sachgebiete

Zitieren

Context pyramidal network for stereo matching regularized by disparity gradients. / Kang, Junhua; Chen, Lin; Deng, Fei et al.
in: ISPRS Journal of Photogrammetry and Remote Sensing, Jahrgang 157, 11.2019, S. 201-215.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Kang J, Chen L, Deng F, Heipke C. Context pyramidal network for stereo matching regularized by disparity gradients. ISPRS Journal of Photogrammetry and Remote Sensing. 2019 Nov;157:201-215. Epub 2019 Sep 27. doi: 10.1016/j.isprsjprs.2019.09.012
Download
@article{e748a47ed1d74e3497b59aaef71019f5,
title = "Context pyramidal network for stereo matching regularized by disparity gradients",
abstract = "Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.",
keywords = "Dilated convolution, Gradient regularizer, Stereo matching, Structure preserving",
author = "Junhua Kang and Lin Chen and Fei Deng and Christian Heipke",
note = "Funding information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit{\"a}t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit?t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.",
year = "2019",
month = nov,
doi = "10.1016/j.isprsjprs.2019.09.012",
language = "English",
volume = "157",
pages = "201--215",
journal = "ISPRS Journal of Photogrammetry and Remote Sensing",
issn = "0924-2716",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Context pyramidal network for stereo matching regularized by disparity gradients

AU - Kang, Junhua

AU - Chen, Lin

AU - Deng, Fei

AU - Heipke, Christian

N1 - Funding information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her study at the Institute of Photogrammetry and GeoInformation, Leibniz Universit?t Hannover, Germany, as a visiting PhD student. Furthermore, we gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.

PY - 2019/11

Y1 - 2019/11

N2 - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

AB - Also after many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has achieved great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). However, most estimation methods, including traditional and deep learning approaches, still have difficulty to handle real-world challenging scenarios, especially those including large depth discontinuity and low texture areas. To tackle these problems, we investigate a recently proposed end-to-end disparity learning network, DispNet (Mayer et al., 2015), and improve it to yield better results in these problematic areas. The improvements consist of three major contributions. First, we use dilated convolutions to develop a context pyramidal feature extraction module. A dilated convolution expands the receptive field of view when extracting features, and aggregates more contextual information, which allows our network to be more robust in weakly textured areas. Second, we construct the matching cost volume with patch-based correlation to handle larger disparities. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Third, instead of using post-processing steps to impose smoothness in the presence of depth discontinuities, we incorporate disparity gradient information as a gradient regularizer into the loss function to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets including Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model decreases the estimation error compared with DispNet on most datasets (e.g. we obtain an improvement of 46% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our proposal also achieves competitive performance compared to other methods.

KW - Dilated convolution

KW - Gradient regularizer

KW - Stereo matching

KW - Structure preserving

UR - http://www.scopus.com/inward/record.url?scp=85072713822&partnerID=8YFLogxK

U2 - 10.1016/j.isprsjprs.2019.09.012

DO - 10.1016/j.isprsjprs.2019.09.012

M3 - Article

AN - SCOPUS:85072713822

VL - 157

SP - 201

EP - 215

JO - ISPRS Journal of Photogrammetry and Remote Sensing

JF - ISPRS Journal of Photogrammetry and Remote Sensing

SN - 0924-2716

ER -