Eudetector: Leveraging language model to identify eu-related news

Koustav Rudra; Danny Tran; Miroslav Shaltev

doi:10.1145/3442442.3452324

Details

Original language	English
Title of host publication	The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021
Pages	380-384
Number of pages	5
ISBN (electronic)	9781450383134
Publication status	Published - 3 Jun 2021
Event	30th World Wide Web Conference, WWW 2021 - Ljubljana, Slovenia Duration: 19 Apr 2021 → 23 Apr 2021

Publication series

Name	The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

ASJC Scopus subject areas

Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Software

Sustainable Development Goals

SDG 16 - Peace, Justice and Strong Institutions

Cite this

Eudetector: Leveraging language model to identify eu-related news. / Rudra, Koustav; Tran, Danny; Shaltev, Miroslav.
The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. p. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Rudra, K, Tran, D & Shaltev, M 2021, Eudetector: Leveraging language model to identify eu-related news. in The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021, pp. 380-384, 30th World Wide Web Conference, WWW 2021, Ljubljana, Slovenia, 19 Apr 2021. https://doi.org/10.1145/3442442.3452324

Rudra, K., Tran, D., & Shaltev, M. (2021). Eudetector: Leveraging language model to identify eu-related news. In The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021 (pp. 380-384). (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). https://doi.org/10.1145/3442442.3452324

Rudra K, Tran D, Shaltev M. Eudetector: Leveraging language model to identify eu-related news. In The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. p. 380-384. (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). doi: 10.1145/3442442.3452324

Rudra, Koustav ; Tran, Danny ; Shaltev, Miroslav. / Eudetector : Leveraging language model to identify eu-related news. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. pp. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).

Download

@inproceedings{6dea650c260a44b6aa2864eaf7d41ddc,

title = "Eudetector: Leveraging language model to identify eu-related news",

abstract = "News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.",

author = "Koustav Rudra and Danny Tran and Miroslav Shaltev",

note = "Funding Information: Funding for this project was in part provided by the European Union{\textquoteright}s Horizon 2020 research and innovation programme under grant agreement No 832921. ; 30th World Wide Web Conference, WWW 2021 ; Conference date: 19-04-2021 Through 23-04-2021",

year = "2021",

month = jun,

day = "3",

doi = "10.1145/3442442.3452324",

language = "English",

series = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",

pages = "380--384",

booktitle = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",

}

Download

TY - GEN

T1 - Eudetector

T2 - 30th World Wide Web Conference, WWW 2021

AU - Rudra, Koustav

AU - Tran, Danny

AU - Shaltev, Miroslav

N1 - Funding Information: Funding for this project was in part provided by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 832921.

PY - 2021/6/3

Y1 - 2021/6/3

N2 - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

AB - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

UR - http://www.scopus.com/inward/record.url?scp=85107673999&partnerID=8YFLogxK

U2 - 10.1145/3442442.3452324

DO - 10.1145/3442442.3452324

M3 - Conference contribution

AN - SCOPUS:85107673999

T3 - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

SP - 380

EP - 384

BT - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Y2 - 19 April 2021 through 23 April 2021

ER -

Research@Leibniz University

Eudetector: Leveraging language model to identify eu-related news

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Sustainable Development Goals

Cite this

By the same author(s)

CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs

My EU = Your EU? Differences in the Perception of European Issues across Geographic Regions

Effects of data quality vetoes on a search for compact binary coalescences in Advanced LIGO's first observing run

Search for continuous gravitational waves from neutron stars in globular cluster NGC 6544

Observation of gravitational waves from a binary black hole merger