Information Extraction as a Filtering Task

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

External Research Organisations

  • Paderborn University
  • Bauhaus-Universität Weimar
View graph of relations

Details

Original languageEnglish
Title of host publicationCIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages2049-2058
Number of pages10
ISBN (print)9781450322638
Publication statusPublished - 27 Oct 2013
Externally publishedYes
Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
Duration: 27 Oct 20131 Nov 2013

Abstract

Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).

Keywords

    Filtering, Information extraction, Relevance, Run-time efficiency, Truth maintenance

ASJC Scopus subject areas

Cite this

Information Extraction as a Filtering Task. / Wachsmuth, Henning; Stein, Benno; Engels, Gregor.
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. New York: Association for Computing Machinery (ACM), 2013. p. 2049-2058.

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Wachsmuth, H, Stein, B & Engels, G 2013, Information Extraction as a Filtering Task. in CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. Association for Computing Machinery (ACM), New York, pp. 2049-2058, 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, United States, 27 Oct 2013. https://doi.org/10.1145/2505515.2505557
Wachsmuth, H., Stein, B., & Engels, G. (2013). Information Extraction as a Filtering Task. In CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management (pp. 2049-2058). Association for Computing Machinery (ACM). https://doi.org/10.1145/2505515.2505557
Wachsmuth H, Stein B, Engels G. Information Extraction as a Filtering Task. In CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. New York: Association for Computing Machinery (ACM). 2013. p. 2049-2058 doi: 10.1145/2505515.2505557
Wachsmuth, Henning ; Stein, Benno ; Engels, Gregor. / Information Extraction as a Filtering Task. CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. New York : Association for Computing Machinery (ACM), 2013. pp. 2049-2058
Download
@inproceedings{d4f0797e60284448ae48e67f8398743e,
title = "Information Extraction as a Filtering Task",
abstract = "Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).",
keywords = "Filtering, Information extraction, Relevance, Run-time efficiency, Truth maintenance",
author = "Henning Wachsmuth and Benno Stein and Gregor Engels",
year = "2013",
month = oct,
day = "27",
doi = "10.1145/2505515.2505557",
language = "English",
isbn = "9781450322638",
pages = "2049--2058",
booktitle = "CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",
note = "22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 ; Conference date: 27-10-2013 Through 01-11-2013",

}

Download

TY - GEN

T1 - Information Extraction as a Filtering Task

AU - Wachsmuth, Henning

AU - Stein, Benno

AU - Engels, Gregor

PY - 2013/10/27

Y1 - 2013/10/27

N2 - Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).

AB - Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments. Copyright is held by the owner/author(s).

KW - Filtering

KW - Information extraction

KW - Relevance

KW - Run-time efficiency

KW - Truth maintenance

UR - http://www.scopus.com/inward/record.url?scp=84889566679&partnerID=8YFLogxK

U2 - 10.1145/2505515.2505557

DO - 10.1145/2505515.2505557

M3 - Conference contribution

AN - SCOPUS:84889566679

SN - 9781450322638

SP - 2049

EP - 2058

BT - CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013

Y2 - 27 October 2013 through 1 November 2013

ER -

By the same author(s)