Semantic segmentation of mobile mapping point clouds via multi-view label transfer

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

  • Torben Peters
  • Claus Brenner
  • Konrad Schindler

Externe Organisationen

  • ETH Zürich
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)30-39
Seitenumfang10
FachzeitschriftISPRS Journal of Photogrammetry and Remote Sensing
Jahrgang202
Frühes Online-Datum8 Juni 2023
PublikationsstatusVeröffentlicht - Aug. 2023

Abstract

We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

ASJC Scopus Sachgebiete

Zitieren

Semantic segmentation of mobile mapping point clouds via multi-view label transfer. / Peters, Torben; Brenner, Claus; Schindler, Konrad.
in: ISPRS Journal of Photogrammetry and Remote Sensing, Jahrgang 202, 08.2023, S. 30-39.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Peters T, Brenner C, Schindler K. Semantic segmentation of mobile mapping point clouds via multi-view label transfer. ISPRS Journal of Photogrammetry and Remote Sensing. 2023 Aug;202:30-39. Epub 2023 Jun 8. doi: 10.1016/j.isprsjprs.2023.05.018
Peters, Torben ; Brenner, Claus ; Schindler, Konrad. / Semantic segmentation of mobile mapping point clouds via multi-view label transfer. in: ISPRS Journal of Photogrammetry and Remote Sensing. 2023 ; Jahrgang 202. S. 30-39.
Download
@article{4048898567a94bf6b882d2e0469b06c9,
title = "Semantic segmentation of mobile mapping point clouds via multi-view label transfer",
abstract = "We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.",
keywords = "3D point clouds, Convolutional neural network (CNN), Label transfer, Multi-view, Semantic segmentation",
author = "Torben Peters and Claus Brenner and Konrad Schindler",
note = "Funding Information: Part of the research was made within the Research Training Group GRK2159, {\textquoteleft}Integrity and collaboration in dynamic sensor networks{\textquoteright} (i.c.sens) which is funded by the German Research Foundation (DFG) .",
year = "2023",
month = aug,
doi = "10.1016/j.isprsjprs.2023.05.018",
language = "English",
volume = "202",
pages = "30--39",
journal = "ISPRS Journal of Photogrammetry and Remote Sensing",
issn = "0924-2716",
publisher = "Elsevier",

}

Download

TY - JOUR

T1 - Semantic segmentation of mobile mapping point clouds via multi-view label transfer

AU - Peters, Torben

AU - Brenner, Claus

AU - Schindler, Konrad

N1 - Funding Information: Part of the research was made within the Research Training Group GRK2159, ‘Integrity and collaboration in dynamic sensor networks’ (i.c.sens) which is funded by the German Research Foundation (DFG) .

PY - 2023/8

Y1 - 2023/8

N2 - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

AB - We study how to learn semantic segmentation of 3D point clouds from small training sets. The problem arises because annotating 3D point clouds is a lot more time-consuming and error-prone than annotating 2D images. On the one hand this means that one cannot afford to create a large enough training dataset for each new project. On the other hand it also means that there is not nearly as much public data available as there is for images, which one could use to pretrain a generic feature extractor that could then, with only little dedicated training data, be adapted (“fine-tuned”) to the task at hand. To address this bottleneck we explore the possibility to transfer knowledge from the 2D image domain to 3D point clouds. That strategy is of particular interest for mobile mapping systems that capture both point clouds and images, in a fully calibrated setting that makes it easy to connect the two domains. We find that, as expected, naively segmenting in image space and mapping the resulting labels onto the point cloud is not sufficient, as visual ambiguities, residual calibration errors, etc. affect the result. Instead, we propose a system that learns to merge image evidence from a varying number viewpoint, and 3D geometry information, into a common representation that encodes point-wise 3D semantics. To validate our approach we make use of a new mobile mapping dataset with 88M annotated 3D points and 2205 oriented multi-view images. In a series of experiments, we show how much label noise is caused by simplistic label transfer, and how well existing semantic segmentation architectures can correct it. Finally, we demonstrate that adding our learned 2D-to-3D multi-view label transfer significantly improves the performance of different segmentation backbones.

KW - 3D point clouds

KW - Convolutional neural network (CNN)

KW - Label transfer

KW - Multi-view

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85161691263&partnerID=8YFLogxK

U2 - 10.1016/j.isprsjprs.2023.05.018

DO - 10.1016/j.isprsjprs.2023.05.018

M3 - Article

AN - SCOPUS:85161691263

VL - 202

SP - 30

EP - 39

JO - ISPRS Journal of Photogrammetry and Remote Sensing

JF - ISPRS Journal of Photogrammetry and Remote Sensing

SN - 0924-2716

ER -