Details
Original language | English |
---|---|
Pages (from-to) | 30-39 |
Number of pages | 10 |
Journal | ISPRS Journal of Photogrammetry and Remote Sensing |
Volume | 202 |
Early online date | 8 Jun 2023 |
Publication status | Published - Aug 2023 |
Abstract
We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.
Keywords
- 3D point clouds, Convolutional neural network (CNN), Label transfer, Multi-view, Semantic segmentation
ASJC Scopus subject areas
- Physics and Astronomy(all)
- Atomic and Molecular Physics, and Optics
- Engineering(all)
- Engineering (miscellaneous)
- Computer Science(all)
- Computer Science Applications
- Earth and Planetary Sciences(all)
- Computers in Earth Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 202, 08.2023, p. 30-39.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Semantic segmentation of mobile mapping point clouds via multi-view label transfer
AU - Peters, Torben
AU - Brenner, Claus
AU - Schindler, Konrad
N1 - Funding Information: Part of the research was made within the Research Training Group GRK2159, ‘Integrity and collaboration in dynamic sensor networks’ (i.c.sens) which is funded by the German Research Foundation (DFG) .
PY - 2023/8
Y1 - 2023/8
N2 - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.
AB - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.
KW - 3D point clouds
KW - Convolutional neural network (CNN)
KW - Label transfer
KW - Multi-view
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85161691263&partnerID=8YFLogxK
U2 - 10.1016/j.isprsjprs.2023.05.018
DO - 10.1016/j.isprsjprs.2023.05.018
M3 - Article
AN - SCOPUS:85161691263
VL - 202
SP - 30
EP - 39
JO - ISPRS Journal of Photogrammetry and Remote Sensing
JF - ISPRS Journal of Photogrammetry and Remote Sensing
SN - 0924-2716
ER -