Learning Efficient Information Extraction on Heterogeneous Texts

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

External Research Organisations

  • Paderborn University
  • Bauhaus-Universität Weimar
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the Sixth International Joint Conference on Natural Language Processing
EditorsRuslan Mitkov, Jong C. Park
Pages534-542
Number of pages9
ISBN (electronic)9784990734800
Publication statusPublished - Oct 2013
Externally publishedYes
Event6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan
Duration: 14 Oct 201318 Oct 2013

Abstract

From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

ASJC Scopus subject areas

Cite this

Learning Efficient Information Extraction on Heterogeneous Texts. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
Proceedings of the Sixth International Joint Conference on Natural Language Processing. ed. / Ruslan Mitkov; Jong C. Park. 2013. p. 534-542.

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Wachsmuth, H, Stein, B & Engels, G 2013, Learning Efficient Information Extraction on Heterogeneous Texts. in R Mitkov & JC Park (eds), Proceedings of the Sixth International Joint Conference on Natural Language Processing. pp. 534-542, 6th International Joint Conference on Natural Language Processing, IJCNLP 2013, Nagoya, Japan, 14 Oct 2013. <https://aclanthology.org/I13-1061>
Wachsmuth, H., Stein, B., & Engels, G. (2013). Learning Efficient Information Extraction on Heterogeneous Texts. In R. Mitkov, & J. C. Park (Eds.), Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 534-542) https://aclanthology.org/I13-1061
Wachsmuth H, Stein B, Engels G. Learning Efficient Information Extraction on Heterogeneous Texts. In Mitkov R, Park JC, editors, Proceedings of the Sixth International Joint Conference on Natural Language Processing. 2013. p. 534-542
Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Learning Efficient Information Extraction on Heterogeneous Texts. Proceedings of the Sixth International Joint Conference on Natural Language Processing. editor / Ruslan Mitkov ; Jong C. Park. 2013. pp. 534-542
Download
@inproceedings{ab6dad213f4e4839839a721844617d24,
title = "Learning Efficient Information Extraction on Heterogeneous Texts",
abstract = "From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
note = "Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.; 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 ; Conference date: 14-10-2013 Through 18-10-2013",
year = "2013",
month = oct,
language = "English",
pages = "534--542",
editor = "Ruslan Mitkov and Park, {Jong C.}",
booktitle = "Proceedings of the Sixth International Joint Conference on Natural Language Processing",

}

Download

TY - GEN

T1 - Learning Efficient Information Extraction on Heterogeneous Texts

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

N1 - Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.

PY - 2013/10

Y1 - 2013/10

N2 - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

AB - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

UR - http://www.scopus.com/inward/record.url?scp=84977887101&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84977887101

SP - 534

EP - 542

BT - Proceedings of the Sixth International Joint Conference on Natural Language Processing

A2 - Mitkov, Ruslan

A2 - Park, Jong C.

T2 - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013

Y2 - 14 October 2013 through 18 October 2013

ER -

By the same author(s)