Encoder-Decoder network for local structure preserving stereo matching

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

  • Junhua Kang
  • Lin Chen
  • Fei Deng
  • Christian Heipke

External Research Organisations

  • Wuhan University
View graph of relations

Details

Original languageEnglish
Title of host publicationBeiträge
Subtitle of host publication39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.
EditorsThomas P. Kersten
Place of PublicationMünchen
Pages363-374
Number of pages12
Publication statusPublished - 2019
EventDreiländertagung OVG-DGPF-SGPF: Photogrammetrie-Fernerkundung-Geoinformation - Universität für Bodenkultur Wien, Wien, Austria
Duration: 20 Feb 201922 Feb 2019
https://dgpf.de/con/jt2019.html

Publication series

NamePublikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
PublisherDeutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
Volume28
ISSN (Print)0942-2870

Abstract

After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-
ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our
proposal also achieves competitive performance compared to other methods.

Cite this

Encoder-Decoder network for local structure preserving stereo matching. / Kang, Junhua; Chen, Lin; Deng, Fei et al.
Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. ed. / Thomas P. Kersten. München, 2019. p. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Vol. 28).

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Kang, J, Chen, L, Deng, F & Heipke, C 2019, Encoder-Decoder network for local structure preserving stereo matching. in TP Kersten (ed.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V., vol. 28, München, pp. 363-374, Dreiländertagung OVG-DGPF-SGPF, Wien, Austria, 20 Feb 2019. <https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf>
Kang, J., Chen, L., Deng, F., & Heipke, C. (2019). Encoder-Decoder network for local structure preserving stereo matching. In T. P. Kersten (Ed.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V. (pp. 363-374). (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Vol. 28).. https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf
Kang J, Chen L, Deng F, Heipke C. Encoder-Decoder network for local structure preserving stereo matching. In Kersten TP, editor, Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. München. 2019. p. 363-374. (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).
Kang, Junhua ; Chen, Lin ; Deng, Fei et al. / Encoder-Decoder network for local structure preserving stereo matching. Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. editor / Thomas P. Kersten. München, 2019. pp. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).
Download
@inproceedings{4252b49b1bd44f02abb297de1c41dc41,
title = "Encoder-Decoder network for local structure preserving stereo matching",
abstract = "After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.",
author = "Junhua Kang and Lin Chen and Fei Deng and Christian Heipke",
note = "Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.; Dreil{\"a}ndertagung OVG-DGPF-SGPF ; Conference date: 20-02-2019 Through 22-02-2019",
year = "2019",
language = "English",
series = "Publikationen der Deutschen Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",
publisher = "Deutsche Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",
pages = "363--374",
editor = "Kersten, {Thomas P.}",
booktitle = "Beitr{\"a}ge",
url = "https://dgpf.de/con/jt2019.html",

}

Download

TY - GEN

T1 - Encoder-Decoder network for local structure preserving stereo matching

AU - Kang, Junhua

AU - Chen, Lin

AU - Deng, Fei

AU - Heipke, Christian

N1 - Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.

PY - 2019

Y1 - 2019

N2 - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

AB - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

M3 - Conference contribution

T3 - Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.

SP - 363

EP - 374

BT - Beiträge

A2 - Kersten, Thomas P.

CY - München

T2 - Dreiländertagung OVG-DGPF-SGPF

Y2 - 20 February 2019 through 22 February 2019

ER -