Details
Original language | English |
---|---|
Title of host publication | Proceedings of the 5th International Joint Conference on Natural Language Processing |
Editors | Haifeng Wang, David Yarowsky |
Pages | 632-640 |
Number of pages | 9 |
ISBN (electronic) | 9789744665645 |
Publication status | Published - Nov 2011 |
Externally published | Yes |
Event | 5th International Joint Conference on Natural Language Processing, IJCNLP 2011 - Chiang Mai, Thailand Duration: 8 Nov 2011 → 13 Nov 2011 |
Abstract
The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.
ASJC Scopus subject areas
- Arts and Humanities(all)
- Language and Linguistics
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Software
- Social Sciences(all)
- Linguistics and Language
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of the 5th International Joint Conference on Natural Language Processing. ed. / Haifeng Wang; David Yarowsky. 2011. p. 632-640.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research
}
TY - GEN
T1 - Back to the Roots of Genres
T2 - 5th International Joint Conference on Natural Language Processing, IJCNLP 2011
AU - Wachsmuth, Henning
AU - Bujna, Kathrin
N1 - Funding Information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS08007A.
PY - 2011/11
Y1 - 2011/11
N2 - The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.
AB - The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.
UR - http://www.scopus.com/inward/record.url?scp=85041137690&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85041137690
SP - 632
EP - 640
BT - Proceedings of the 5th International Joint Conference on Natural Language Processing
A2 - Wang, Haifeng
A2 - Yarowsky, David
Y2 - 8 November 2011 through 13 November 2011
ER -