Back to the Roots of Genres: Text Classification by Language Function

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

External Research Organisations

  • Paderborn University
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of the 5th International Joint Conference on Natural Language Processing
EditorsHaifeng Wang, David Yarowsky
Pages632-640
Number of pages9
ISBN (electronic)9789744665645
Publication statusPublished - Nov 2011
Externally publishedYes
Event5th International Joint Conference on Natural Language Processing, IJCNLP 2011 - Chiang Mai, Thailand
Duration: 8 Nov 201113 Nov 2011

Abstract

The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.

ASJC Scopus subject areas

Cite this

Back to the Roots of Genres: Text Classification by Language Function. / Wachsmuth, Henning; Bujna, Kathrin.
Proceedings of the 5th International Joint Conference on Natural Language Processing. ed. / Haifeng Wang; David Yarowsky. 2011. p. 632-640.

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Wachsmuth, H & Bujna, K 2011, Back to the Roots of Genres: Text Classification by Language Function. in H Wang & D Yarowsky (eds), Proceedings of the 5th International Joint Conference on Natural Language Processing. pp. 632-640, 5th International Joint Conference on Natural Language Processing, IJCNLP 2011, Chiang Mai, Thailand, 8 Nov 2011. <https://aclanthology.org/I11-1071.pdf>
Wachsmuth, H., & Bujna, K. (2011). Back to the Roots of Genres: Text Classification by Language Function. In H. Wang, & D. Yarowsky (Eds.), Proceedings of the 5th International Joint Conference on Natural Language Processing (pp. 632-640) https://aclanthology.org/I11-1071.pdf
Wachsmuth H, Bujna K. Back to the Roots of Genres: Text Classification by Language Function. In Wang H, Yarowsky D, editors, Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011. p. 632-640
Wachsmuth, Henning ; Bujna, Kathrin. / Back to the Roots of Genres : Text Classification by Language Function. Proceedings of the 5th International Joint Conference on Natural Language Processing. editor / Haifeng Wang ; David Yarowsky. 2011. pp. 632-640
Download
@inproceedings{41358c80f81f46a6a6af36f02ec78b04,
title = "Back to the Roots of Genres: Text Classification by Language Function",
abstract = "The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.",
author = "Henning Wachsmuth and Kathrin Bujna",
note = "Funding Information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS08007A.; 5th International Joint Conference on Natural Language Processing, IJCNLP 2011 ; Conference date: 08-11-2011 Through 13-11-2011",
year = "2011",
month = nov,
language = "English",
pages = "632--640",
editor = "Haifeng Wang and David Yarowsky",
booktitle = "Proceedings of the 5th International Joint Conference on Natural Language Processing",

}

Download

TY - GEN

T1 - Back to the Roots of Genres

T2 - 5th International Joint Conference on Natural Language Processing, IJCNLP 2011

AU - Wachsmuth, Henning

AU - Bujna, Kathrin

N1 - Funding Information: This work was partly funded by the German Federal Ministry of Education and Research (BMBF) under contract number 01IS08007A.

PY - 2011/11

Y1 - 2011/11

N2 - The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.

AB - The term “genre” covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.

UR - http://www.scopus.com/inward/record.url?scp=85041137690&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85041137690

SP - 632

EP - 640

BT - Proceedings of the 5th International Joint Conference on Natural Language Processing

A2 - Wang, Haifeng

A2 - Yarowsky, David

Y2 - 8 November 2011 through 13 November 2011

ER -

By the same author(s)