A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

Thi Huyen Nguyen; Marco Fisichella; Koustav Rudra

doi:10.1109/TCSS.2024.3391395

Details

Originalsprache	Englisch
Seiten (von - bis)	6229-6241
Seitenumfang	13
Fachzeitschrift	IEEE Transactions on Computational Social Systems
Jahrgang	11
Ausgabenummer	5
Frühes Online-Datum	13 Mai 2024
Publikationsstatus	Veröffentlicht - Okt. 2024

Abstract

Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Modellierung und Simulation
Sozialwissenschaften (insg.)
Sozialwissenschaften (sonstige)
Informatik (insg.)
Mensch-Maschine-Interaktion

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen

Zitieren

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs. / Nguyen, Thi Huyen; Fisichella, Marco; Rudra, Koustav.
in: IEEE Transactions on Computational Social Systems, Jahrgang 11, Nr. 5, 10.2024, S. 6229-6241.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Nguyen, TH, Fisichella, M & Rudra, K 2024, 'A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs', IEEE Transactions on Computational Social Systems, Jg. 11, Nr. 5, S. 6229-6241. https://doi.org/10.1109/TCSS.2024.3391395

Nguyen, T. H., Fisichella, M., & Rudra, K. (2024). A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs. IEEE Transactions on Computational Social Systems, 11(5), 6229-6241. https://doi.org/10.1109/TCSS.2024.3391395

Nguyen TH, Fisichella M, Rudra K. A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs. IEEE Transactions on Computational Social Systems. 2024 Okt;11(5):6229-6241. Epub 2024 Mai 13. doi: 10.1109/TCSS.2024.3391395

Nguyen, Thi Huyen ; Fisichella, Marco ; Rudra, Koustav. / A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs. in: IEEE Transactions on Computational Social Systems. 2024 ; Jahrgang 11, Nr. 5. S. 6229-6241.

Download

@article{af2b8ef7075c4f3980446fc7af14cbd0,

title = "A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs",

abstract = "Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.",

keywords = "Blogs, Classification, Computational modeling, Data mining, Diseases, epidemic, Feature extraction, health crisis, microblogs, Social networking (online), Transformers, trustworthy systems",

author = "Nguyen, {Thi Huyen} and Marco Fisichella and Koustav Rudra",

note = "Publisher Copyright: IEEE",

year = "2024",

month = oct,

doi = "10.1109/TCSS.2024.3391395",

language = "English",

volume = "11",

pages = "6229--6241",

number = "5",

}

Download

TY - JOUR

T1 - A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

AU - Nguyen, Thi Huyen

AU - Fisichella, Marco

AU - Rudra, Koustav

N1 - Publisher Copyright: IEEE

PY - 2024/10

Y1 - 2024/10

N2 - Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.

AB - Social media platforms, such as Twitter, are crucial resources to obtain situational information during disease outbreaks. Due to the sheer volume of user-generated content, providing tools that can automatically classify input texts into various types, such as symptoms, transmission, prevention measures, etc., and generate concise situational updates is necessary. Apart from high classification accuracy, interpretability is an important requirement when designing machine learning models for tasks in medical domain. In this article, we provide annotated epidemic-related datasets with labels of information types and rationales, which are short phrases from the original tweets, to support the assigned labels. Next, we introduce a trustworthy approach for the automatic classification of tweets posted during epidemics. Our classification model is able to extract short explanations/rationales for output decisions on unseen data. Moreover, we propose a simple graph-based ranking method to generate short summaries of tweets. Experiments on two epidemic-related datasets show the following: 1) our classification model obtains an average of 82% Macro-F1 and better interpretability scores in terms of Token-F1 (20% improvement) than baselines; 2) the extracted rationales capture essential disease-related information in the tweets; 3) our graph-based method with rationales is simple, yet efficient for generating concise situational updates.

KW - Blogs

KW - Classification

KW - Computational modeling

KW - Data mining

KW - Diseases

KW - epidemic

KW - Feature extraction

KW - health crisis

KW - microblogs

KW - Social networking (online)

KW - Transformers

KW - trustworthy systems

UR - http://www.scopus.com/inward/record.url?scp=85193299864&partnerID=8YFLogxK

U2 - 10.1109/TCSS.2024.3391395

DO - 10.1109/TCSS.2024.3391395

M3 - Article

AN - SCOPUS:85193299864

VL - 11

SP - 6229

EP - 6241

JO - IEEE Transactions on Computational Social Systems

JF - IEEE Transactions on Computational Social Systems

IS - 5

ER -

Research@Leibniz University

A Trustworthy Approach to Classify and Analyze Epidemic-Related Information From Microblogs

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Ziele für nachhaltige Entwicklung

Zitieren

Von denselben Autoren

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

FairTrade: Achieving Pareto-Optimal Trade-Offs between Balanced Accuracy and Fairness in Federated Learning

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Open benchmark for filtering techniques in entity resolution

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

FairTrade: Achieving Pareto-Optimal Trade-Offs between Balanced Accuracy and Fairness in Federated Learning

Harnessing Empathy and Ethics for Relevance Detection and Information Categorization in Climate and COVID-19 Tweets

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Open benchmark for filtering techniques in entity resolution