Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014 |
Herausgeber (Verlag) | Association for Computing Machinery (ACM) |
ISBN (Print) | 9781450325387 |
Publikationsstatus | Veröffentlicht - 2 Juni 2014 |
Veranstaltung | 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014 - Thessaloniki, Griechenland Dauer: 2 Juni 2014 → 4 Juni 2014 |
Publikationsreihe
Name | ACM International Conference Proceeding Series |
---|
Abstract
Crowdsourcing has become ubiquitous in machine learning as a cost effective method to gather training labels. In this paper we examine the challenges that appear when employing crowdsourcing for active learning, in an integrated environment where an automatic method and human labelers work together towards improving their performance at a certain task. By using Active Learning techniques on crowd-labeled data, we optimize the performance of the automatic method towards better accuracy, while keeping the costs low by gathering data on demand. In order to verify our proposed methods, we apply them to the task of deduplication of publications in a digital library by examining metadata. We investigate the problems created by noisy labels produced by the crowd and explore methods to aggregate them. We analyze how different automatic methods are affected by the quantity and quality of the allocated resources as well as the instance selection strategies for each active learning round, aiming towards attaining a balance between cost and performance.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Software
- Informatik (insg.)
- Mensch-Maschine-Interaktion
- Informatik (insg.)
- Maschinelles Sehen und Mustererkennung
- Informatik (insg.)
- Computernetzwerke und -kommunikation
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014. Association for Computing Machinery (ACM), 2014. (ACM International Conference Proceeding Series).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - When in Doubt Ask the Crowd
T2 - 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
AU - Georgescu, Mihai
AU - Pham, Dang Duc
AU - Firan, Claudiu S.
AU - Gadiraju, Ujwal
AU - Nejdl, Wolfgang
PY - 2014/6/2
Y1 - 2014/6/2
N2 - Crowdsourcing has become ubiquitous in machine learning as a cost effective method to gather training labels. In this paper we examine the challenges that appear when employing crowdsourcing for active learning, in an integrated environment where an automatic method and human labelers work together towards improving their performance at a certain task. By using Active Learning techniques on crowd-labeled data, we optimize the performance of the automatic method towards better accuracy, while keeping the costs low by gathering data on demand. In order to verify our proposed methods, we apply them to the task of deduplication of publications in a digital library by examining metadata. We investigate the problems created by noisy labels produced by the crowd and explore methods to aggregate them. We analyze how different automatic methods are affected by the quantity and quality of the allocated resources as well as the instance selection strategies for each active learning round, aiming towards attaining a balance between cost and performance.
AB - Crowdsourcing has become ubiquitous in machine learning as a cost effective method to gather training labels. In this paper we examine the challenges that appear when employing crowdsourcing for active learning, in an integrated environment where an automatic method and human labelers work together towards improving their performance at a certain task. By using Active Learning techniques on crowd-labeled data, we optimize the performance of the automatic method towards better accuracy, while keeping the costs low by gathering data on demand. In order to verify our proposed methods, we apply them to the task of deduplication of publications in a digital library by examining metadata. We investigate the problems created by noisy labels produced by the crowd and explore methods to aggregate them. We analyze how different automatic methods are affected by the quantity and quality of the allocated resources as well as the instance selection strategies for each active learning round, aiming towards attaining a balance between cost and performance.
KW - Active Learning
KW - Crowdsourcing
KW - Human Computation
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=84903649754&partnerID=8YFLogxK
U2 - 10.1145/2611040.2611047
DO - 10.1145/2611040.2611047
M3 - Conference contribution
AN - SCOPUS:84903649754
SN - 9781450325387
T3 - ACM International Conference Proceeding Series
BT - 4th International Conference on Web Intelligence, Mining and Semantics, WIMS 2014
PB - Association for Computing Machinery (ACM)
Y2 - 2 June 2014 through 4 June 2014
ER -