Details
Original language | English |
---|---|
Title of host publication | Multimodal Scene Understanding |
Subtitle of host publication | Algorithms, Applications and Deep Learning |
Editors | Michael Ying Yang, Bodo Rosenhahn, Vittorio Murino |
Publisher | Elsevier |
Pages | 65-100 |
Number of pages | 36 |
ISBN (electronic) | 9780128173589 |
Publication status | Published - 2 Aug 2019 |
Abstract
In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.
Keywords
- Convolutional neural networks, Localization, Object detection, Random forests
ASJC Scopus subject areas
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Multimodal Scene Understanding: Algorithms, Applications and Deep Learning. ed. / Michael Ying Yang; Bodo Rosenhahn; Vittorio Murino. Elsevier, 2019. p. 65-100.
Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review
}
TY - CHAP
T1 - Learning convolutional neural networks for object detection with very little training data
AU - Reinders, Christoph
AU - Ackermann, Hanno
AU - Yang, Michael Ying
AU - Rosenhahn, Bodo
PY - 2019/8/2
Y1 - 2019/8/2
N2 - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.
AB - In recent years, convolutional neural networks have shown great success in various computer vision tasks such as classification, object detection, and scene analysis. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. The availability of sufficient data, however, limits possible applications. While large amounts of data can be quickly collected, supervised learning further requires labeled data. Labeling data, unfortunately, is usually very time-consuming and literally expensive. This chapter addresses the problem of learning with very little labeled data for extracting information about the infrastructure in urban areas. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. The presented system for object detection is trained with very few training examples. To achieve this, the advantages of convolutional neural networks and random forests are combined to learn a patch-wise classifier. In the next step, the random forest is mapped to a neural network and the classifier is transformed to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, GPS-data is integrated to localize the predictions on the map and multiple observations are merged to further improve the localization accuracy. In comparison to faster R-CNN and other networks for object detection or algorithms for transfer learning, the required amount of labeled data is considerably reduced.
KW - Convolutional neural networks
KW - Localization
KW - Object detection
KW - Random forests
UR - http://www.scopus.com/inward/record.url?scp=85082047720&partnerID=8YFLogxK
U2 - 10.1016/b978-0-12-817358-9.00010-x
DO - 10.1016/b978-0-12-817358-9.00010-x
M3 - Contribution to book/anthology
AN - SCOPUS:85082047720
SP - 65
EP - 100
BT - Multimodal Scene Understanding
A2 - Ying Yang, Michael
A2 - Rosenhahn, Bodo
A2 - Murino, Vittorio
PB - Elsevier
ER -