Learning convolutional neural networks for object detection with very little training data

Christoph Reinders; Hanno Ackermann; Michael Ying Yang; Bodo Rosenhahn

doi:10.1016/b978-0-12-817358-9.00010-x

Details

Original language	English
Title of host publication	Multimodal Scene Understanding
Subtitle of host publication	Algorithms, Applications and Deep Learning
Editors	Michael Ying Yang, Bodo Rosenhahn, Vittorio Murino
Publisher	Elsevier
Pages	65-100
Number of pages	36
ISBN (electronic)	9780128173589
Publication status	Published - 2 Aug 2019

Abstract

In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

Keywords

Convolutional neural networks, Localization, Object detection, Random forests

ASJC Scopus subject areas

Computer Science(all)
General Computer Science

Cite this

Learning convolutional neural networks for object detection with very little training data. / Reinders, Christoph; Ackermann, Hanno; Yang, Michael Ying et al.
Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. ed. / Michael Ying Yang; Bodo Rosenhahn; Vittorio Murino. Elsevier, 2019. p. 65-100.

Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review

Reinders, C, Ackermann, H, Yang, MY & Rosenhahn, B 2019, Learning convolutional neural networks for object detection with very little training data. in M Ying Yang, B Rosenhahn & V Murino (eds), Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. Elsevier, pp. 65-100. https://doi.org/10.1016/b978-0-12-817358-9.00010-x

Reinders, C., Ackermann, H., Yang, M. Y., & Rosenhahn, B. (2019). Learning convolutional neural networks for object detection with very little training data. In M. Ying Yang, B. Rosenhahn, & V. Murino (Eds.), Multimodal Scene Understanding: Algorithms, Applications and Deep Learning (pp. 65-100). Elsevier. https://doi.org/10.1016/b978-0-12-817358-9.00010-x

Reinders C, Ackermann H, Yang MY, Rosenhahn B. Learning convolutional neural networks for object detection with very little training data. In Ying Yang M, Rosenhahn B, Murino V, editors, Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. Elsevier. 2019. p. 65-100 doi: 10.1016/b978-0-12-817358-9.00010-x

Reinders, Christoph ; Ackermann, Hanno ; Yang, Michael Ying et al. / Learning convolutional neural networks for object detection with very little training data. Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. editor / Michael Ying Yang ; Bodo Rosenhahn ; Vittorio Murino. Elsevier, 2019. pp. 65-100

Download

@inbook{52c99bf849324dbeb958e05c7fb52560,

title = "Learning convolutional neural networks for object detection with very little training data",

abstract = "In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.",

keywords = "Convolutional neural networks, Localization, Object detection, Random forests",

author = "Christoph Reinders and Hanno Ackermann and Yang, {Michael Ying} and Bodo Rosenhahn",

year = "2019",

month = aug,

day = "2",

doi = "10.1016/b978-0-12-817358-9.00010-x",

language = "English",

pages = "65--100",

editor = "{Ying Yang}, Michael and Bodo Rosenhahn and Vittorio Murino",

booktitle = "Multimodal Scene Understanding",

publisher = "Elsevier",

address = "Netherlands",

}

Download

TY - CHAP

T1 - Learning convolutional neural networks for object detection with very little training data

AU - Reinders, Christoph

AU - Ackermann, Hanno

AU - Yang, Michael Ying

AU - Rosenhahn, Bodo

PY - 2019/8/2

Y1 - 2019/8/2

N2 - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

AB - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

KW - Convolutional neural networks

KW - Localization

KW - Object detection

KW - Random forests

UR - http://www.scopus.com/inward/record.url?scp=85082047720&partnerID=8YFLogxK

U2 - 10.1016/b978-0-12-817358-9.00010-x

DO - 10.1016/b978-0-12-817358-9.00010-x

M3 - Contribution to book/anthology

AN - SCOPUS:85082047720

SP - 65

EP - 100

BT - Multimodal Scene Understanding

A2 - Ying Yang, Michael

A2 - Rosenhahn, Bodo

A2 - Murino, Vittorio

PB - Elsevier

ER -

Research@Leibniz University

Learning convolutional neural networks for object detection with very little training data

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction