Eudetector: Leveraging language model to identify eu-related news

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • Leibniz Centre for Tropical Marine Research (ZMT)
View graph of relations

Details

Original languageEnglish
Title of host publicationThe Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021
Pages380-384
Number of pages5
ISBN (electronic)9781450383134
Publication statusPublished - 3 Jun 2021
Event30th World Wide Web Conference, WWW 2021 - Ljubljana, Slovenia
Duration: 19 Apr 202123 Apr 2021

Publication series

NameThe Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

ASJC Scopus subject areas

Sustainable Development Goals

Cite this

Eudetector: Leveraging language model to identify eu-related news. / Rudra, Koustav; Tran, Danny; Shaltev, Miroslav.
The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. p. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Rudra, K, Tran, D & Shaltev, M 2021, Eudetector: Leveraging language model to identify eu-related news. in The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021, pp. 380-384, 30th World Wide Web Conference, WWW 2021, Ljubljana, Slovenia, 19 Apr 2021. https://doi.org/10.1145/3442442.3452324
Rudra, K., Tran, D., & Shaltev, M. (2021). Eudetector: Leveraging language model to identify eu-related news. In The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021 (pp. 380-384). (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). https://doi.org/10.1145/3442442.3452324
Rudra K, Tran D, Shaltev M. Eudetector: Leveraging language model to identify eu-related news. In The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. p. 380-384. (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). doi: 10.1145/3442442.3452324
Rudra, Koustav ; Tran, Danny ; Shaltev, Miroslav. / Eudetector : Leveraging language model to identify eu-related news. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. pp. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).
Download
@inproceedings{6dea650c260a44b6aa2864eaf7d41ddc,
title = "Eudetector: Leveraging language model to identify eu-related news",
abstract = "News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.",
author = "Koustav Rudra and Danny Tran and Miroslav Shaltev",
note = "Funding Information: Funding for this project was in part provided by the European Union{\textquoteright}s Horizon 2020 research and innovation programme under grant agreement No 832921. ; 30th World Wide Web Conference, WWW 2021 ; Conference date: 19-04-2021 Through 23-04-2021",
year = "2021",
month = jun,
day = "3",
doi = "10.1145/3442442.3452324",
language = "English",
series = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",
pages = "380--384",
booktitle = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",

}

Download

TY - GEN

T1 - Eudetector

T2 - 30th World Wide Web Conference, WWW 2021

AU - Rudra, Koustav

AU - Tran, Danny

AU - Shaltev, Miroslav

N1 - Funding Information: Funding for this project was in part provided by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 832921.

PY - 2021/6/3

Y1 - 2021/6/3

N2 - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

AB - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

UR - http://www.scopus.com/inward/record.url?scp=85107673999&partnerID=8YFLogxK

U2 - 10.1145/3442442.3452324

DO - 10.1145/3442442.3452324

M3 - Conference contribution

AN - SCOPUS:85107673999

T3 - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

SP - 380

EP - 384

BT - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Y2 - 19 April 2021 through 23 April 2021

ER -

By the same author(s)