Details
Original language | English |
---|---|
Pages (from-to) | 483-498 |
Number of pages | 16 |
Journal | PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science |
Volume | 92 |
Issue number | 5 |
Early online date | 18 Sept 2024 |
Publication status | Published - Oct 2024 |
Abstract
An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.
Keywords
- 3D Occupancy Prediction, 3D Perception, NeRF, Semantic Scene Completion
ASJC Scopus subject areas
- Social Sciences(all)
- Geography, Planning and Development
- Physics and Astronomy(all)
- Instrumentation
- Earth and Planetary Sciences(all)
- Earth and Planetary Sciences (miscellaneous)
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Vol. 92, No. 5, 10.2024, p. 483-498.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images
AU - Abualhanud, S.
AU - Erahan, E.
AU - Mehltretter, M.
N1 - Publisher Copyright: © The Author(s) 2024.
PY - 2024/10
Y1 - 2024/10
N2 - An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.
AB - An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.
KW - 3D Occupancy Prediction
KW - 3D Perception
KW - NeRF
KW - Semantic Scene Completion
UR - http://www.scopus.com/inward/record.url?scp=85204175168&partnerID=8YFLogxK
U2 - 10.1007/s41064-024-00308-9
DO - 10.1007/s41064-024-00308-9
M3 - Article
AN - SCOPUS:85204175168
VL - 92
SP - 483
EP - 498
JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
SN - 2512-2789
IS - 5
ER -