How much is Wikipedia Lagging Behind News?

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Authors

  • Besnik Fetahu
  • Abhijit Anand
  • Avishek Anand

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationWebSci '15: Proceedings of the ACM Web Science Conference
ISBN (electronic)9781450336727
Publication statusPublished - 2017
Event7th ACM Web Science Conference 2015 - Oxford, United Kingdom (UK)
Duration: 28 Jun 20151 Jul 2015
Conference number: 7

Abstract

Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page.

Keywords

    cs.IR, Entity lag, Emergent entity density, Event lag, News reference density, News corpora, Wikipedia

ASJC Scopus subject areas

Cite this

How much is Wikipedia Lagging Behind News? / Fetahu, Besnik; Anand, Abhijit; Anand, Avishek.
WebSci '15: Proceedings of the ACM Web Science Conference. 2017.

Research output: Chapter in book/report/conference proceedingConference contributionResearch

Fetahu, B, Anand, A & Anand, A 2017, How much is Wikipedia Lagging Behind News? in WebSci '15: Proceedings of the ACM Web Science Conference. 7th ACM Web Science Conference 2015, Oxford, United Kingdom (UK), 28 Jun 2015. https://doi.org/10.1145/2786451.2786460
Fetahu, B., Anand, A., & Anand, A. (2017). How much is Wikipedia Lagging Behind News? In WebSci '15: Proceedings of the ACM Web Science Conference https://doi.org/10.1145/2786451.2786460
Fetahu B, Anand A, Anand A. How much is Wikipedia Lagging Behind News? In WebSci '15: Proceedings of the ACM Web Science Conference. 2017 doi: 10.1145/2786451.2786460
Fetahu, Besnik ; Anand, Abhijit ; Anand, Avishek. / How much is Wikipedia Lagging Behind News?. WebSci '15: Proceedings of the ACM Web Science Conference. 2017.
Download
@inproceedings{c331ce6b41eb42f79beb2303653fbe9a,
title = "How much is Wikipedia Lagging Behind News?",
abstract = " Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page. ",
keywords = "cs.IR, Entity lag, Emergent entity density, Event lag, News reference density, News corpora, Wikipedia",
author = "Besnik Fetahu and Abhijit Anand and Avishek Anand",
note = "Funding information: This work was funded by the ERC Advanced Grant ALEXANDRIA under the grant number 339233.; 7th ACM Web Science Conference 2015 ; Conference date: 28-06-2015 Through 01-07-2015",
year = "2017",
doi = "10.1145/2786451.2786460",
language = "English",
isbn = "978-1-4503-3672-7",
booktitle = "WebSci '15: Proceedings of the ACM Web Science Conference",

}

Download

TY - GEN

T1 - How much is Wikipedia Lagging Behind News?

AU - Fetahu, Besnik

AU - Anand, Abhijit

AU - Anand, Avishek

N1 - Conference code: 7

PY - 2017

Y1 - 2017

N2 - Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page.

AB - Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page.

KW - cs.IR

KW - Entity lag

KW - Emergent entity density

KW - Event lag

KW - News reference density

KW - News corpora

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=84978160817&partnerID=8YFLogxK

U2 - 10.1145/2786451.2786460

DO - 10.1145/2786451.2786460

M3 - Conference contribution

SN - 978-1-4503-3672-7

BT - WebSci '15: Proceedings of the ACM Web Science Conference

T2 - 7th ACM Web Science Conference 2015

Y2 - 28 June 2015 through 1 July 2015

ER -