Learning convolutional neural networks for object detection with very little training data

Research output: Chapter in book/report/conference proceedingContribution to book/anthologyResearchpeer review

Authors

External Research Organisations

  • University of Twente
View graph of relations

Details

Original languageEnglish
Title of host publicationMultimodal Scene Understanding
Subtitle of host publicationAlgorithms, Applications and Deep Learning
EditorsMichael Ying Yang, Bodo Rosenhahn, Vittorio Murino
PublisherElsevier
Pages65-100
Number of pages36
ISBN (electronic)9780128173589
Publication statusPublished - 2 Aug 2019

Abstract

In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

Keywords

    Convolutional neural networks, Localization, Object detection, Random forests

ASJC Scopus subject areas

Cite this

Learning convolutional neural networks for object detection with very little training data. / Reinders, Christoph; Ackermann, Hanno; Yang, Michael Ying et al.
Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. ed. / Michael Ying Yang; Bodo Rosenhahn; Vittorio Murino. Elsevier, 2019. p. 65-100.

Research output: Chapter in book/report/conference proceedingContribution to book/anthologyResearchpeer review

Reinders, C, Ackermann, H, Yang, MY & Rosenhahn, B 2019, Learning convolutional neural networks for object detection with very little training data. in M Ying Yang, B Rosenhahn & V Murino (eds), Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. Elsevier, pp. 65-100. https://doi.org/10.1016/b978-0-12-817358-9.00010-x
Reinders, C., Ackermann, H., Yang, M. Y., & Rosenhahn, B. (2019). Learning convolutional neural networks for object detection with very little training data. In M. Ying Yang, B. Rosenhahn, & V. Murino (Eds.), Multimodal Scene Understanding: Algorithms, Applications and Deep Learning (pp. 65-100). Elsevier. https://doi.org/10.1016/b978-0-12-817358-9.00010-x
Reinders C, Ackermann H, Yang MY, Rosenhahn B. Learning convolutional neural networks for object detection with very little training data. In Ying Yang M, Rosenhahn B, Murino V, editors, Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. Elsevier. 2019. p. 65-100 doi: 10.1016/b978-0-12-817358-9.00010-x
Reinders, Christoph ; Ackermann, Hanno ; Yang, Michael Ying et al. / Learning convolutional neural networks for object detection with very little training data. Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. editor / Michael Ying Yang ; Bodo Rosenhahn ; Vittorio Murino. Elsevier, 2019. pp. 65-100
Download
@inbook{52c99bf849324dbeb958e05c7fb52560,
title = "Learning convolutional neural networks for object detection with very little training data",
abstract = "In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.",
keywords = "Convolutional neural networks, Localization, Object detection, Random forests",
author = "Christoph Reinders and Hanno Ackermann and Yang, {Michael Ying} and Bodo Rosenhahn",
year = "2019",
month = aug,
day = "2",
doi = "10.1016/b978-0-12-817358-9.00010-x",
language = "English",
pages = "65--100",
editor = "{Ying Yang}, Michael and Bodo Rosenhahn and Vittorio Murino",
booktitle = "Multimodal Scene Understanding",
publisher = "Elsevier",
address = "Netherlands",

}

Download

TY - CHAP

T1 - Learning convolutional neural networks for object detection with very little training data

AU - Reinders, Christoph

AU - Ackermann, Hanno

AU - Yang, Michael Ying

AU - Rosenhahn, Bodo

PY - 2019/8/2

Y1 - 2019/8/2

N2 - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

AB - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.

KW - Convolutional neural networks

KW - Localization

KW - Object detection

KW - Random forests

UR - http://www.scopus.com/inward/record.url?scp=85082047720&partnerID=8YFLogxK

U2 - 10.1016/b978-0-12-817358-9.00010-x

DO - 10.1016/b978-0-12-817358-9.00010-x

M3 - Contribution to book/anthology

AN - SCOPUS:85082047720

SP - 65

EP - 100

BT - Multimodal Scene Understanding

A2 - Ying Yang, Michael

A2 - Rosenhahn, Bodo

A2 - Murino, Vittorio

PB - Elsevier

ER -

By the same author(s)