Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Florian Kluger; Hanno Ackermann; Eric Brachmann; Michael Ying Yang; Bodo Rosenhahn

doi:10.1109/CVPR46437.2021.01287

Details

Original language	English
Title of host publication	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	13065-13074
Number of pages	10
ISBN (electronic)	978-1-6654-4509-2
ISBN (print)	978-1-6654-4510-8
Publication status	Published - 2021

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919
ISSN (electronic)	2575-7075

Abstract

Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

Keywords

cs.CV

ASJC Scopus subject areas

Computer Science(all)
Software
Computer Science(all)
Computer Vision and Pattern Recognition

Cite this

Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images. / Kluger, Florian; Ackermann, Hanno; Brachmann, Eric et al.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. p. 13065-13074 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Kluger, F, Ackermann, H, Brachmann, E, Yang, MY & Rosenhahn, B 2021, Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 13065-13074. https://doi.org/10.1109/CVPR46437.2021.01287

Kluger, F., Ackermann, H., Brachmann, E., Yang, M. Y., & Rosenhahn, B. (2021). Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 13065-13074). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR46437.2021.01287

Kluger F, Ackermann H, Brachmann E, Yang MY, Rosenhahn B. Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2021. p. 13065-13074. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.01287

Kluger, Florian ; Ackermann, Hanno ; Brachmann, Eric et al. / Cuboids Revisited : Learning Robust 3D Shape Fitting to Single RGB Images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 13065-13074 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Download

@inproceedings{d8054db7c5e2469792e06b815d9b8441,

title = "Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images",

abstract = " Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts. ",

keywords = "cs.CV",

author = "Florian Kluger and Hanno Ackermann and Eric Brachmann and Yang, {Michael Ying} and Bodo Rosenhahn",

note = "Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).",

year = "2021",

doi = "10.1109/CVPR46437.2021.01287",

language = "English",

isbn = "978-1-6654-4510-8",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "13065--13074",

booktitle = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

address = "United States",

}

Download

TY - GEN

T1 - Cuboids Revisited

T2 - Learning Robust 3D Shape Fitting to Single RGB Images

AU - Kluger, Florian

AU - Ackermann, Hanno

AU - Brachmann, Eric

AU - Yang, Michael Ying

AU - Rosenhahn, Bodo

N1 - Funding Information: This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).

PY - 2021

Y1 - 2021

N2 - Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

AB - Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

KW - cs.CV

UR - http://www.scopus.com/inward/record.url?scp=85121017940&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.01287

DO - 10.1109/CVPR46437.2021.01287

M3 - Conference contribution

SN - 978-1-6654-4510-8

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 13065

EP - 13074

BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Research@Leibniz University

Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Automl for Multi-Class Anomaly Compensation of Sensor Drift

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection