Encoder-Decoder network for local structure preserving stereo matching

Junhua Kang; Lin Chen; Fei Deng; Christian Heipke

Details

Original language	English
Title of host publication	Beiträge
Subtitle of host publication	39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.
Editors	Thomas P. Kersten
Place of Publication	München
Pages	363-374
Number of pages	12
Publication status	Published - 2019
Event	Dreiländertagung OVG-DGPF-SGPF: Photogrammetrie-Fernerkundung-Geoinformation - Universität für Bodenkultur Wien, Wien, Austria Duration: 20 Feb 2019 → 22 Feb 2019 https://dgpf.de/con/jt2019.html

Publication series

Name	Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
Publisher	Deutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
Volume	28
ISSN (Print)	0942-2870

Abstract

After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-
ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our
proposal also achieves competitive performance compared to other methods.

Cite this

Encoder-Decoder network for local structure preserving stereo matching. / Kang, Junhua; Chen, Lin; Deng, Fei et al.
Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. ed. / Thomas P. Kersten. München, 2019. p. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Vol. 28).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research

Kang, J, Chen, L, Deng, F & Heipke, C 2019, Encoder-Decoder network for local structure preserving stereo matching. in TP Kersten (ed.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V., vol. 28, München, pp. 363-374, Dreiländertagung OVG-DGPF-SGPF, Wien, Austria, 20 Feb 2019. <https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf>

Kang, J., Chen, L., Deng, F., & Heipke, C. (2019). Encoder-Decoder network for local structure preserving stereo matching. In T. P. Kersten (Ed.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V. (pp. 363-374). (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Vol. 28).. https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf

Kang J, Chen L, Deng F, Heipke C. Encoder-Decoder network for local structure preserving stereo matching. In Kersten TP, editor, Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. München. 2019. p. 363-374. (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).

Kang, Junhua ; Chen, Lin ; Deng, Fei et al. / Encoder-Decoder network for local structure preserving stereo matching. Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. editor / Thomas P. Kersten. München, 2019. pp. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).

Download

@inproceedings{4252b49b1bd44f02abb297de1c41dc41,

title = "Encoder-Decoder network for local structure preserving stereo matching",

abstract = "After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.",

author = "Junhua Kang and Lin Chen and Fei Deng and Christian Heipke",

note = "Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.; Dreil{\"a}ndertagung OVG-DGPF-SGPF ; Conference date: 20-02-2019 Through 22-02-2019",

year = "2019",

language = "English",

series = "Publikationen der Deutschen Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",

publisher = "Deutsche Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",

pages = "363--374",

editor = "Kersten, {Thomas P.}",

booktitle = "Beitr{\"a}ge",

url = "https://dgpf.de/con/jt2019.html",

}

Download

TY - GEN

T1 - Encoder-Decoder network for local structure preserving stereo matching

AU - Kang, Junhua

AU - Chen, Lin

AU - Deng, Fei

AU - Heipke, Christian

N1 - Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.

PY - 2019

Y1 - 2019

N2 - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

AB - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

M3 - Conference contribution

T3 - Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.

SP - 363

EP - 374

BT - Beiträge

A2 - Kersten, Thomas P.

CY - München

T2 - Dreiländertagung OVG-DGPF-SGPF

Y2 - 20 February 2019 through 22 February 2019

ER -

Research@Leibniz University

Encoder-Decoder network for local structure preserving stereo matching

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

Cite this