Details
Original language | English |
---|---|
Title of host publication | WebSci '15: Proceedings of the ACM Web Science Conference |
ISBN (electronic) | 9781450336727 |
Publication status | Published - 2017 |
Event | 7th ACM Web Science Conference 2015 - Oxford, United Kingdom (UK) Duration: 28 Jun 2015 → 1 Jul 2015 Conference number: 7 |
Abstract
Keywords
- cs.IR, Entity lag, Emergent entity density, Event lag, News reference density, News corpora, Wikipedia
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
WebSci '15: Proceedings of the ACM Web Science Conference. 2017.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research
}
TY - GEN
T1 - How much is Wikipedia Lagging Behind News?
AU - Fetahu, Besnik
AU - Anand, Abhijit
AU - Anand, Avishek
N1 - Conference code: 7
PY - 2017
Y1 - 2017
N2 - Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page.
AB - Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of \emph{The New York Times} dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20\% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12\% of the entities mentioned in the event page are created after the creation of the event page.
KW - cs.IR
KW - Entity lag
KW - Emergent entity density
KW - Event lag
KW - News reference density
KW - News corpora
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84978160817&partnerID=8YFLogxK
U2 - 10.1145/2786451.2786460
DO - 10.1145/2786451.2786460
M3 - Conference contribution
SN - 978-1-4503-3672-7
BT - WebSci '15: Proceedings of the ACM Web Science Conference
T2 - 7th ACM Web Science Conference 2015
Y2 - 28 June 2015 through 1 July 2015
ER -