Automatic classification of documents in cold-start scenarios

Ricardo Kawase; Marco Fisichella; Bernardo Pereira Nunes; Kyung Hun Ha; Markus Bick

Details

Originalsprache	Englisch
Titel des Sammelwerks	3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013
Publikationsstatus	Veröffentlicht - 2013
Veranstaltung	3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013 - Madrid, Spanien Dauer: 12 Juni 2013 → 14 Juni 2013

Publikationsreihe

Name	ACM International Conference Proceeding Series

Abstract

Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on theWeb. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Mensch-Maschine-Interaktion
Informatik (insg.)
Maschinelles Sehen und Mustererkennung
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Automatic classification of documents in cold-start scenarios. / Kawase, Ricardo; Fisichella, Marco; Nunes, Bernardo Pereira et al.
3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013. 2013. 19 (ACM International Conference Proceeding Series).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Kawase, R, Fisichella, M, Nunes, BP, Ha, KH & Bick, M 2013, Automatic classification of documents in cold-start scenarios. in 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013., 19, ACM International Conference Proceeding Series, 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013, Madrid, Spanien, 12 Juni 2013.

Kawase, R., Fisichella, M., Nunes, B. P., Ha, K. H., & Bick, M. (2013). Automatic classification of documents in cold-start scenarios. In 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013 Artikel 19 (ACM International Conference Proceeding Series).

Kawase R, Fisichella M, Nunes BP, Ha KH, Bick M. Automatic classification of documents in cold-start scenarios. in 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013. 2013. 19. (ACM International Conference Proceeding Series).

Kawase, Ricardo ; Fisichella, Marco ; Nunes, Bernardo Pereira et al. / Automatic classification of documents in cold-start scenarios. 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013. 2013. (ACM International Conference Proceeding Series).

Download

@inproceedings{6ab32d73241c4556b8e592bf6cb2949d,

title = "Automatic classification of documents in cold-start scenarios",

abstract = "Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on theWeb. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.",

keywords = "Automatic classification, Cold-start, Digital libraries, Information retrieval, User evaluation",

author = "Ricardo Kawase and Marco Fisichella and Nunes, {Bernardo Pereira} and Ha, {Kyung Hun} and Markus Bick",

year = "2013",

language = "English",

isbn = "9781450318501",

series = "ACM International Conference Proceeding Series",

booktitle = "3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013",

note = "3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013 ; Conference date: 12-06-2013 Through 14-06-2013",

}

Download

TY - GEN

T1 - Automatic classification of documents in cold-start scenarios

AU - Kawase, Ricardo

AU - Fisichella, Marco

AU - Nunes, Bernardo Pereira

AU - Ha, Kyung Hun

AU - Bick, Markus

PY - 2013

Y1 - 2013

N2 - Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on theWeb. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

AB - Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on theWeb. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

KW - Automatic classification

KW - Cold-start

KW - Digital libraries

KW - Information retrieval

KW - User evaluation

UR - http://www.scopus.com/inward/record.url?scp=84879751815&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84879751815

SN - 9781450318501

T3 - ACM International Conference Proceeding Series

BT - 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013

T2 - 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013

Y2 - 12 June 2013 through 14 June 2013

ER -

Research@Leibniz University

Automatic classification of documents in cold-start scenarios

Autoren

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers