Constructing Efficient Information Extraction Pipelines

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

External Research Organisations

  • Paderborn University
  • Bauhaus-Universität Weimar
View graph of relations

Details

Original languageEnglish
Title of host publicationCIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages2237-2240
Number of pages4
ISBN (print)9781450307178
Publication statusPublished - Oct 2011
Externally publishedYes
Event20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom (UK)
Duration: 24 Oct 201128 Oct 2011

Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

Keywords

    information extraction, run-time efficiency

ASJC Scopus subject areas

Cite this

Constructing Efficient Information Extraction Pipelines. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM), 2011. p. 2237-2240.

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Wachsmuth, H, Stein, B & Engels, G 2011, Constructing Efficient Information Extraction Pipelines. in CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. Association for Computing Machinery (ACM), New York, pp. 2237-2240, 20th ACM Conference on Information and Knowledge Management, CIKM'11, Glasgow, United Kingdom (UK), 24 Oct 2011. https://doi.org/10.1145/2063576.2063935
Wachsmuth, H., Stein, B., & Engels, G. (2011). Constructing Efficient Information Extraction Pipelines. In CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2237-2240). Association for Computing Machinery (ACM). https://doi.org/10.1145/2063576.2063935
Wachsmuth H, Stein B, Engels G. Constructing Efficient Information Extraction Pipelines. In CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM). 2011. p. 2237-2240 doi: 10.1145/2063576.2063935
Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Constructing Efficient Information Extraction Pipelines. CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York : Association for Computing Machinery (ACM), 2011. pp. 2237-2240
Download
@inproceedings{546437cc6b654257b4d4c722f3a093ea,
title = "Constructing Efficient Information Extraction Pipelines",
abstract = "Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much {"}efficiency potential{"} depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.",
keywords = "information extraction, run-time efficiency",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
year = "2011",
month = oct,
doi = "10.1145/2063576.2063935",
language = "English",
isbn = "9781450307178",
pages = "2237--2240",
booktitle = "CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",
note = "20th ACM Conference on Information and Knowledge Management, CIKM'11 ; Conference date: 24-10-2011 Through 28-10-2011",

}

Download

TY - GEN

T1 - Constructing Efficient Information Extraction Pipelines

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

PY - 2011/10

Y1 - 2011/10

N2 - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

AB - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

KW - information extraction

KW - run-time efficiency

UR - http://www.scopus.com/inward/record.url?scp=83055186740&partnerID=8YFLogxK

U2 - 10.1145/2063576.2063935

DO - 10.1145/2063576.2063935

M3 - Conference contribution

AN - SCOPUS:83055186740

SN - 9781450307178

SP - 2237

EP - 2240

BT - CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11

Y2 - 24 October 2011 through 28 October 2011

ER -

By the same author(s)