Constructing Efficient Information Extraction Pipelines

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Autoren

Externe Organisationen

  • Universität Paderborn
  • Bauhaus-Universität Weimar
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksCIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
ErscheinungsortNew York
Herausgeber (Verlag)Association for Computing Machinery (ACM)
Seiten2237-2240
Seitenumfang4
ISBN (Print)9781450307178
PublikationsstatusVeröffentlicht - Okt. 2011
Extern publiziertJa
Veranstaltung20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, Großbritannien / Vereinigtes Königreich
Dauer: 24 Okt. 201128 Okt. 2011

Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

ASJC Scopus Sachgebiete

Zitieren

Constructing Efficient Information Extraction Pipelines. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM), 2011. S. 2237-2240.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Wachsmuth, H, Stein, B & Engels, G 2011, Constructing Efficient Information Extraction Pipelines. in CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. Association for Computing Machinery (ACM), New York, S. 2237-2240, 20th ACM Conference on Information and Knowledge Management, CIKM'11, Glasgow, Großbritannien / Vereinigtes Königreich, 24 Okt. 2011. https://doi.org/10.1145/2063576.2063935
Wachsmuth, H., Stein, B., & Engels, G. (2011). Constructing Efficient Information Extraction Pipelines. In CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management (S. 2237-2240). Association for Computing Machinery (ACM). https://doi.org/10.1145/2063576.2063935
Wachsmuth H, Stein B, Engels G. Constructing Efficient Information Extraction Pipelines. in CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM). 2011. S. 2237-2240 doi: 10.1145/2063576.2063935
Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Constructing Efficient Information Extraction Pipelines. CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York : Association for Computing Machinery (ACM), 2011. S. 2237-2240
Download
@inproceedings{546437cc6b654257b4d4c722f3a093ea,
title = "Constructing Efficient Information Extraction Pipelines",
abstract = "Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much {"}efficiency potential{"} depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.",
keywords = "information extraction, run-time efficiency",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
year = "2011",
month = oct,
doi = "10.1145/2063576.2063935",
language = "English",
isbn = "9781450307178",
pages = "2237--2240",
booktitle = "CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",
note = "20th ACM Conference on Information and Knowledge Management, CIKM'11 ; Conference date: 24-10-2011 Through 28-10-2011",

}

Download

TY - GEN

T1 - Constructing Efficient Information Extraction Pipelines

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

PY - 2011/10

Y1 - 2011/10

N2 - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

AB - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

KW - information extraction

KW - run-time efficiency

UR - http://www.scopus.com/inward/record.url?scp=83055186740&partnerID=8YFLogxK

U2 - 10.1145/2063576.2063935

DO - 10.1145/2063576.2063935

M3 - Conference contribution

AN - SCOPUS:83055186740

SN - 9781450307178

SP - 2237

EP - 2240

BT - CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11

Y2 - 24 October 2011 through 28 October 2011

ER -

Von denselben Autoren