Encoder-Decoder network for local structure preserving stereo matching

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Autoren

  • Junhua Kang
  • Lin Chen
  • Fei Deng
  • Christian Heipke

Externe Organisationen

  • Wuhan University
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksBeiträge
Untertitel39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.
Herausgeber/-innenThomas P. Kersten
ErscheinungsortMünchen
Seiten363-374
Seitenumfang12
PublikationsstatusVeröffentlicht - 2019
VeranstaltungDreiländertagung OVG-DGPF-SGPF: Photogrammetrie-Fernerkundung-Geoinformation - Universität für Bodenkultur Wien, Wien, Österreich
Dauer: 20 Feb. 201922 Feb. 2019
https://dgpf.de/con/jt2019.html

Publikationsreihe

NamePublikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
Herausgeber (Verlag)Deutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.
Band28
ISSN (Print)0942-2870

Abstract

After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-
ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, our
proposal also achieves competitive performance compared to other methods.

Zitieren

Encoder-Decoder network for local structure preserving stereo matching. / Kang, Junhua; Chen, Lin; Deng, Fei et al.
Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. Hrsg. / Thomas P. Kersten. München, 2019. S. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Band 28).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Kang, J, Chen, L, Deng, F & Heipke, C 2019, Encoder-Decoder network for local structure preserving stereo matching. in TP Kersten (Hrsg.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V., Bd. 28, München, S. 363-374, Dreiländertagung OVG-DGPF-SGPF, Wien, Österreich, 20 Feb. 2019. <https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf>
Kang, J., Chen, L., Deng, F., & Heipke, C. (2019). Encoder-Decoder network for local structure preserving stereo matching. In T. P. Kersten (Hrsg.), Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V. (S. 363-374). (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.; Band 28).. https://dgpf.de/src/tagung/jt2019/proceedings/proceedings/papers/67_3LT2019_Kang_et_al.pdf
Kang J, Chen L, Deng F, Heipke C. Encoder-Decoder network for local structure preserving stereo matching. in Kersten TP, Hrsg., Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. München. 2019. S. 363-374. (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).
Kang, Junhua ; Chen, Lin ; Deng, Fei et al. / Encoder-Decoder network for local structure preserving stereo matching. Beiträge: 39. Wissenschaftlich-Technische Jahrestagung der DGPF e.V.. Hrsg. / Thomas P. Kersten. München, 2019. S. 363-374 (Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.).
Download
@inproceedings{4252b49b1bd44f02abb297de1c41dc41,
title = "Encoder-Decoder network for local structure preserving stereo matching",
abstract = "After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.",
author = "Junhua Kang and Lin Chen and Fei Deng and Christian Heipke",
note = "Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.; Dreil{\"a}ndertagung OVG-DGPF-SGPF ; Conference date: 20-02-2019 Through 22-02-2019",
year = "2019",
language = "English",
series = "Publikationen der Deutschen Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",
publisher = "Deutsche Gesellschaft f{\"u}r Photogrammetrie, Fernerkundung und Geoinformation e.V.",
pages = "363--374",
editor = "Kersten, {Thomas P.}",
booktitle = "Beitr{\"a}ge",
url = "https://dgpf.de/con/jt2019.html",

}

Download

TY - GEN

T1 - Encoder-Decoder network for local structure preserving stereo matching

AU - Kang, Junhua

AU - Chen, Lin

AU - Deng, Fei

AU - Heipke, Christian

N1 - Funding Information: The author Junhua Kang would like to thank the China Scholarship Council (CSC) for financially supporting her as a visiting PhD student at Leibniz University Hannover, Germany. We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.

PY - 2019

Y1 - 2019

N2 - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

AB - After many years of research, stereo matching remains to be a challenging task in photogrammetry and computer vision. Recent work has shown great progress by formulating dense stereo matching as a pixel-wise learning task to be resolved with a deep convolutional neural network (CNN). In this paper we investigate a recently proposed end-to-end disparity learning network, DispNet (MAYER et al. 2015), and improve it to yield better results in some problematic areas. The improvements consist in two major contributions. First, in order to handle large disparities, we modify the correlation module to construct the matching cost volume with patch-based correlation. We also modify the basic encoder-decoder module to regress detailed disparity images with full resolution. Second, instead of using post-processing steps to impose smoothness and handle depth discontinuities, we incorporate disparity gradi-ent information as a regularizer to preserve local structure details in large depth discontinuity areas. We evaluate our model in terms of end-point-error on several challenging stereo datasets such as Scene Flow, Sintel and KITTI. Experimental results demonstrate that our model achieves better performance than DispNet on most datasets (e.g. we obtain an improvement of 36% on Sintel) and estimates better structure-preserving disparity maps. Moreover, ourproposal also achieves competitive performance compared to other methods.

M3 - Conference contribution

T3 - Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e.V.

SP - 363

EP - 374

BT - Beiträge

A2 - Kersten, Thomas P.

CY - München

T2 - Dreiländertagung OVG-DGPF-SGPF

Y2 - 20 February 2019 through 22 February 2019

ER -