Details
Original language | English |
---|---|
Title of host publication | CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management |
Place of Publication | New York |
Publisher | Association for Computing Machinery (ACM) |
Pages | 2237-2240 |
Number of pages | 4 |
ISBN (print) | 9781450307178 |
Publication status | Published - Oct 2011 |
Externally published | Yes |
Event | 20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom (UK) Duration: 24 Oct 2011 → 28 Oct 2011 |
Abstract
Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.
Keywords
- information extraction, run-time efficiency
ASJC Scopus subject areas
- Decision Sciences(all)
- General Decision Sciences
- Business, Management and Accounting(all)
- General Business,Management and Accounting
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management. New York: Association for Computing Machinery (ACM), 2011. p. 2237-2240.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research
}
TY - GEN
T1 - Constructing Efficient Information Extraction Pipelines
AU - Wachsmuth, Henning
AU - Stein, Benno
AU - Engels, Gregor
PY - 2011/10
Y1 - 2011/10
N2 - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.
AB - Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.
KW - information extraction
KW - run-time efficiency
UR - http://www.scopus.com/inward/record.url?scp=83055186740&partnerID=8YFLogxK
U2 - 10.1145/2063576.2063935
DO - 10.1145/2063576.2063935
M3 - Conference contribution
AN - SCOPUS:83055186740
SN - 9781450307178
SP - 2237
EP - 2240
BT - CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
PB - Association for Computing Machinery (ACM)
CY - New York
T2 - 20th ACM Conference on Information and Knowledge Management, CIKM'11
Y2 - 24 October 2011 through 28 October 2011
ER -