Constructing Efficient Information Extraction Pipelines

Henning Wachsmuth; Benno Stein; Gregor Engels

doi:10.1145/2063576.2063935

Details

Original language	English
Title of host publication	CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Pages	2237-2240
Number of pages	4
ISBN (print)	9781450307178
Publication status	Published - Oct 2011
Externally published	Yes
Event	20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom (UK) Duration: 24 Oct 2011 → 28 Oct 2011

Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

Keywords

information extraction, run-time efficiency

ASJC Scopus subject areas

Decision Sciences(all)
General Decision Sciences
Business, Management and Accounting(all)
General Business,Management and Accounting

Cite this

Constructing Efficient Information Extraction Pipelines. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM), 2011. p. 2237-2240.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research

Wachsmuth, H, Stein, B & Engels, G 2011, Constructing Efficient Information Extraction Pipelines. in CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. Association for Computing Machinery (ACM), New York, pp. 2237-2240, 20th ACM Conference on Information and Knowledge Management, CIKM'11, Glasgow, United Kingdom (UK), 24 Oct 2011. https://doi.org/10.1145/2063576.2063935

Wachsmuth, H., Stein, B., & Engels, G. (2011). Constructing Efficient Information Extraction Pipelines. In CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2237-2240). Association for Computing Machinery (ACM). https://doi.org/10.1145/2063576.2063935

Wachsmuth H, Stein B, Engels G. Constructing Efficient Information Extraction Pipelines. In CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM). 2011. p. 2237-2240 doi: 10.1145/2063576.2063935

Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Constructing Efficient Information Extraction Pipelines. CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York : Association for Computing Machinery (ACM), 2011. pp. 2237-2240

Download

@inproceedings{546437cc6b654257b4d4c722f3a093ea,

title = "Constructing Efficient Information Extraction Pipelines",

abstract = "Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much {"}efficiency potential{"} depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.",

keywords = "information extraction, run-time efficiency",

author = "Henning Wachsmuth and Benno Stein and Gregor Engels",

year = "2011",

month = oct,

doi = "10.1145/2063576.2063935",

language = "English",

isbn = "9781450307178",

pages = "2237--2240",

booktitle = "CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "20th ACM Conference on Information and Knowledge Management, CIKM'11 ; Conference date: 24-10-2011 Through 28-10-2011",

}

Download

TY - GEN

T1 - Constructing Efficient Information Extraction Pipelines

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

PY - 2011/10

Y1 - 2011/10

N2 - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

AB - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

KW - information extraction

KW - run-time efficiency

UR - http://www.scopus.com/inward/record.url?scp=83055186740&partnerID=8YFLogxK

U2 - 10.1145/2063576.2063935

DO - 10.1145/2063576.2063935

M3 - Conference contribution

AN - SCOPUS:83055186740

SN - 9781450307178

SP - 2237

EP - 2240

BT - CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11

Y2 - 24 October 2011 through 28 October 2011

ER -

Research@Leibniz University

Constructing Efficient Information Extraction Pipelines

Authors

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Improving Argument Effectiveness Across Ideologies using Instruction-tuned Large Language Models

Towards Modeling and Evaluating Instructional Explanations in Teacher-Student Dialogues

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

Mehrebenenannotation argumentativer Lerner∗innentexte für die automatische Textauswertung

When to use a metaphor: Metaphors in dialogical explanations with addressees of different expertise

Improving Argument Effectiveness Across Ideologies using Instruction-tuned Large Language Models

Towards Modeling and Evaluating Instructional Explanations in Teacher-Student Dialogues

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

Mehrebenenannotation argumentativer Lerner∗innentexte für die automatische Textauswertung

When to use a metaphor: Metaphors in dialogical explanations with addressees of different expertise

Improving Argument Effectiveness Across Ideologies using Instruction-tuned Large Language Models