Robust Shape Fitting for 3D Scene Abstraction

Research output: Contribution to journalArticleResearchpeer review

Authors

Research Organisations

External Research Organisations

  • Niantic Inc.
  • University of Bath
View graph of relations

Details

Original languageEnglish
Pages (from-to)6306-6325
Number of pages20
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume46
Issue number9
Early online date19 Mar 2024
Publication statusPublished - Sept 2024

Abstract

Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

Keywords

    cuboid fitting, Estimation, Image reconstruction, minimal solver, multi-model fitting, Scene abstraction, Shape, shape decomposition, Solid modeling, Surface reconstruction, Three-dimensional displays, Training

ASJC Scopus subject areas

Cite this

Robust Shape Fitting for 3D Scene Abstraction. / Kluger, Florian; Brachmann, Eric; Yang, Michael Ying et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, No. 9, 09.2024, p. 6306-6325.

Research output: Contribution to journalArticleResearchpeer review

Kluger F, Brachmann E, Yang MY, Rosenhahn B. Robust Shape Fitting for 3D Scene Abstraction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024 Sept;46(9):6306-6325. Epub 2024 Mar 19. doi: 10.48550/arXiv.2403.10452, 10.1109/TPAMI.2024.3379014
Kluger, Florian ; Brachmann, Eric ; Yang, Michael Ying et al. / Robust Shape Fitting for 3D Scene Abstraction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024 ; Vol. 46, No. 9. pp. 6306-6325.
Download
@article{efa0adbf6c6548cda932c632436b0a2d,
title = "Robust Shape Fitting for 3D Scene Abstraction",
abstract = "Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.",
keywords = "cuboid fitting, Estimation, Image reconstruction, minimal solver, multi-model fitting, Scene abstraction, Shape, shape decomposition, Solid modeling, Surface reconstruction, Three-dimensional displays, Training",
author = "Florian Kluger and Eric Brachmann and Yang, {Michael Ying} and Bodo Rosenhahn",
note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",
year = "2024",
month = sep,
doi = "10.48550/arXiv.2403.10452",
language = "English",
volume = "46",
pages = "6306--6325",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",
number = "9",

}

Download

TY - JOUR

T1 - Robust Shape Fitting for 3D Scene Abstraction

AU - Kluger, Florian

AU - Brachmann, Eric

AU - Yang, Michael Ying

AU - Rosenhahn, Bodo

N1 - Publisher Copyright: © 1979-2012 IEEE.

PY - 2024/9

Y1 - 2024/9

N2 - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

AB - Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

KW - cuboid fitting

KW - Estimation

KW - Image reconstruction

KW - minimal solver

KW - multi-model fitting

KW - Scene abstraction

KW - Shape

KW - shape decomposition

KW - Solid modeling

KW - Surface reconstruction

KW - Three-dimensional displays

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85188527432&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2403.10452

DO - 10.48550/arXiv.2403.10452

M3 - Article

AN - SCOPUS:85188527432

VL - 46

SP - 6306

EP - 6325

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 9

ER -

By the same author(s)