Learning Efficient Information Extraction on Heterogeneous Texts

Henning Wachsmuth; Benno Stein; Gregor Engels

Details

Original language	English
Title of host publication	Proceedings of the Sixth International Joint Conference on Natural Language Processing
Editors	Ruslan Mitkov, Jong C. Park
Pages	534-542
Number of pages	9
ISBN (electronic)	9784990734800
Publication status	Published - Oct 2013
Externally published	Yes
Event	6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Nagoya, Japan Duration: 14 Oct 2013 → 18 Oct 2013

Abstract

From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence
Computer Science(all)
Software

Cite this

Learning Efficient Information Extraction on Heterogeneous Texts. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
Proceedings of the Sixth International Joint Conference on Natural Language Processing. ed. / Ruslan Mitkov; Jong C. Park. 2013. p. 534-542.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research

Wachsmuth, H, Stein, B & Engels, G 2013, Learning Efficient Information Extraction on Heterogeneous Texts. in R Mitkov & JC Park (eds), Proceedings of the Sixth International Joint Conference on Natural Language Processing. pp. 534-542, 6th International Joint Conference on Natural Language Processing, IJCNLP 2013, Nagoya, Japan, 14 Oct 2013. <https://aclanthology.org/I13-1061>

Wachsmuth, H., Stein, B., & Engels, G. (2013). Learning Efficient Information Extraction on Heterogeneous Texts. In R. Mitkov, & J. C. Park (Eds.), Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 534-542) https://aclanthology.org/I13-1061

Wachsmuth H, Stein B, Engels G. Learning Efficient Information Extraction on Heterogeneous Texts. In Mitkov R, Park JC, editors, Proceedings of the Sixth International Joint Conference on Natural Language Processing. 2013. p. 534-542

Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Learning Efficient Information Extraction on Heterogeneous Texts. Proceedings of the Sixth International Joint Conference on Natural Language Processing. editor / Ruslan Mitkov ; Jong C. Park. 2013. pp. 534-542

Download

@inproceedings{ab6dad213f4e4839839a721844617d24,

title = "Learning Efficient Information Extraction on Heterogeneous Texts",

abstract = "From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.",

author = "Henning Wachsmuth and Benno Stein and Gregor Engels",

note = "Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.; 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 ; Conference date: 14-10-2013 Through 18-10-2013",

year = "2013",

month = oct,

language = "English",

pages = "534--542",

editor = "Ruslan Mitkov and Park, {Jong C.}",

booktitle = "Proceedings of the Sixth International Joint Conference on Natural Language Processing",

}

Download

TY - GEN

T1 - Learning Efficient Information Extraction on Heterogeneous Texts

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

N1 - Funding information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS11016A.

PY - 2013/10

Y1 - 2013/10

N2 - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

AB - From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-times of efficient schedules as a function of the heterogeneity of the texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.

UR - http://www.scopus.com/inward/record.url?scp=84977887101&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84977887101

SP - 534

EP - 542

BT - Proceedings of the Sixth International Joint Conference on Natural Language Processing

A2 - Mitkov, Ruslan

A2 - Park, Jong C.

T2 - 6th International Joint Conference on Natural Language Processing, IJCNLP 2013

Y2 - 14 October 2013 through 18 October 2013

ER -

Research@Leibniz University

Learning Efficient Information Extraction on Heterogeneous Texts

Authors

External Research Organisations

Details

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

When to use a metaphor: Metaphors in dialogical explanations with addressees of different expertise

Improving Argument Effectiveness Across Ideologies using Instruction-tuned Large Language Models

Towards Modeling and Evaluating Instructional Explanations in Teacher-Student Dialogues

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback