Details
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 13065-13074 |
Number of pages | 10 |
ISBN (electronic) | 978-1-6654-4509-2 |
ISBN (print) | 978-1-6654-4510-8 |
Publication status | Published - 2021 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
ISSN (Print) | 1063-6919 |
ISSN (electronic) | 2575-7075 |
Abstract
Keywords
- cs.CV
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. p. 13065-13074 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Cuboids Revisited
T2 - Learning Robust 3D Shape Fitting to Single RGB Images
AU - Kluger, Florian
AU - Ackermann, Hanno
AU - Brachmann, Eric
AU - Yang, Michael Ying
AU - Rosenhahn, Bodo
N1 - Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).
PY - 2021
Y1 - 2021
N2 - Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
AB - Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
KW - cs.CV
UR - http://www.scopus.com/inward/record.url?scp=85121017940&partnerID=8YFLogxK
U2 - 10.1109/CVPR46437.2021.01287
DO - 10.1109/CVPR46437.2021.01287
M3 - Conference contribution
SN - 978-1-6654-4510-8
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 13065
EP - 13074
BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
ER -