Details
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth International Joint Conference on Natural Language Processing |
Editors | Ruslan Mitkov, Jong C. Park |
Pages | 534-542 |
Number of pages | 9 |
ISBN (electronic) | 9784990734800 |
Publication status | Published - Oct 2013 |
Externally published | Yes |
Event | 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan Duration: 14 Oct 2013 → 18 Oct 2013 |
Abstract
From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the Sixth International Joint Conference on Natural Language Processing. ed. / Ruslan Mitkov; Jong C. Park. 2013. p. 534-542.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research
}
TY - GEN
T1 - Learning Efficient Information Extraction on Heterogeneous Texts
AU - Wachsmuth, Henning
AU - Stein, Benno
AU - Engels, Gregor
N1 - Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.
PY - 2013/10
Y1 - 2013/10
N2 - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
AB - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
UR - http://www.scopus.com/inward/record.url?scp=84977887101&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84977887101
SP - 534
EP - 542
BT - Proceedings of the Sixth International Joint Conference on Natural Language Processing
A2 - Mitkov, Ruslan
A2 - Park, Jong C.
T2 - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013
Y2 - 14 October 2013 through 18 October 2013
ER -