Eudetector: Leveraging language model to identify eu-related news

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Leibniz-Zentrum für Marine Tropenökologie GmbH
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksThe Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021
Seiten380-384
Seitenumfang5
ISBN (elektronisch)9781450383134
PublikationsstatusVeröffentlicht - 3 Juni 2021
Veranstaltung30th World Wide Web Conference, WWW 2021 - Ljubljana, Slowenien
Dauer: 19 Apr. 202123 Apr. 2021

Publikationsreihe

NameThe Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

ASJC Scopus Sachgebiete

Ziele für nachhaltige Entwicklung

Zitieren

Eudetector: Leveraging language model to identify eu-related news. / Rudra, Koustav; Tran, Danny; Shaltev, Miroslav.
The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. S. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Rudra, K, Tran, D & Shaltev, M 2021, Eudetector: Leveraging language model to identify eu-related news. in The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021, S. 380-384, 30th World Wide Web Conference, WWW 2021, Ljubljana, Slowenien, 19 Apr. 2021. https://doi.org/10.1145/3442442.3452324
Rudra, K., Tran, D., & Shaltev, M. (2021). Eudetector: Leveraging language model to identify eu-related news. In The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021 (S. 380-384). (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). https://doi.org/10.1145/3442442.3452324
Rudra K, Tran D, Shaltev M. Eudetector: Leveraging language model to identify eu-related news. in The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. S. 380-384. (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021). doi: 10.1145/3442442.3452324
Rudra, Koustav ; Tran, Danny ; Shaltev, Miroslav. / Eudetector : Leveraging language model to identify eu-related news. The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021. 2021. S. 380-384 (The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021).
Download
@inproceedings{6dea650c260a44b6aa2864eaf7d41ddc,
title = "Eudetector: Leveraging language model to identify eu-related news",
abstract = "News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.",
author = "Koustav Rudra and Danny Tran and Miroslav Shaltev",
note = "Funding Information: Funding for this project was in part provided by the European Union{\textquoteright}s Horizon 2020 research and innovation programme under grant agreement No 832921. ; 30th World Wide Web Conference, WWW 2021 ; Conference date: 19-04-2021 Through 23-04-2021",
year = "2021",
month = jun,
day = "3",
doi = "10.1145/3442442.3452324",
language = "English",
series = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",
pages = "380--384",
booktitle = "The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021",

}

Download

TY - GEN

T1 - Eudetector

T2 - 30th World Wide Web Conference, WWW 2021

AU - Rudra, Koustav

AU - Tran, Danny

AU - Shaltev, Miroslav

N1 - Funding Information: Funding for this project was in part provided by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 832921.

PY - 2021/6/3

Y1 - 2021/6/3

N2 - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

AB - News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-Trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

UR - http://www.scopus.com/inward/record.url?scp=85107673999&partnerID=8YFLogxK

U2 - 10.1145/3442442.3452324

DO - 10.1145/3442442.3452324

M3 - Conference contribution

AN - SCOPUS:85107673999

T3 - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

SP - 380

EP - 384

BT - The Web Conference 2021 - Companion of the World Wide Web Conference, WWW 2021

Y2 - 19 April 2021 through 23 April 2021

ER -

Von denselben Autoren