TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets

Pavlos Fafalios; Vasileios Iosifidis; Eirini Ntoutsi; Stefan Dietze

doi:10.1007/978-3-319-93417-4_12

Details

Original language	English
Title of host publication	The Semantic Web
Subtitle of host publication	15th International Conference
Editors	Aldo Gangemi, Raphaël Troncy, Roberto Navigli, Laura Hollink, Maria-Esther Vidal, Pascal Hitzler, Anna Tordai, Mehwish Alam
Publisher	Springer Verlag
Pages	177-190
Number of pages	14
ISBN (electronic)	9783319934174
ISBN (print)	9783319934167
Publication status	Published - 3 Jun 2018
Event	15th International Conference on Extended Semantic Web Conference, ESWC 2018 - Heraklion, Greece Duration: 3 Jun 2018 → 7 Jun 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	10843
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan’13–Nov’17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.

Keywords

Entity linking, RDF, Sentiment analysis, Social media archives, Twitter

ASJC Scopus subject areas

Mathematics(all)
Theoretical Computer Science
Computer Science(all)
General Computer Science

Cite this

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. / Fafalios, Pavlos; Iosifidis, Vasileios; Ntoutsi, Eirini et al.
The Semantic Web: 15th International Conference. ed. / Aldo Gangemi; Raphaël Troncy; Roberto Navigli; Laura Hollink; Maria-Esther Vidal; Pascal Hitzler; Anna Tordai; Mehwish Alam. Springer Verlag, 2018. p. 177-190 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10843).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Fafalios, P, Iosifidis, V, Ntoutsi, E & Dietze, S 2018, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. in A Gangemi, R Troncy, R Navigli, L Hollink, M-E Vidal, P Hitzler, A Tordai & M Alam (eds), The Semantic Web: 15th International Conference. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10843, Springer Verlag, pp. 177-190, 15th International Conference on Extended Semantic Web Conference, ESWC 2018, Heraklion, Greece, 3 Jun 2018. https://doi.org/10.1007/978-3-319-93417-4_12

Fafalios, P., Iosifidis, V., Ntoutsi, E., & Dietze, S. (2018). TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In A. Gangemi, R. Troncy, R. Navigli, L. Hollink, M.-E. Vidal, P. Hitzler, A. Tordai, & M. Alam (Eds.), The Semantic Web: 15th International Conference (pp. 177-190). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10843). Springer Verlag. https://doi.org/10.1007/978-3-319-93417-4_12

Fafalios P, Iosifidis V, Ntoutsi E, Dietze S. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In Gangemi A, Troncy R, Navigli R, Hollink L, Vidal ME, Hitzler P, Tordai A, Alam M, editors, The Semantic Web: 15th International Conference. Springer Verlag. 2018. p. 177-190. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-93417-4_12

Fafalios, Pavlos ; Iosifidis, Vasileios ; Ntoutsi, Eirini et al. / TweetsKB : A Public and Large-Scale RDF Corpus of Annotated Tweets. The Semantic Web: 15th International Conference. editor / Aldo Gangemi ; Raphaël Troncy ; Roberto Navigli ; Laura Hollink ; Maria-Esther Vidal ; Pascal Hitzler ; Anna Tordai ; Mehwish Alam. Springer Verlag, 2018. pp. 177-190 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{3a6e0771606e4e58badc625ab14d4acb,

title = "TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets",

abstract = "Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan{\textquoteright}13–Nov{\textquoteright}17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.",

keywords = "Entity linking, RDF, Sentiment analysis, Social media archives, Twitter",

author = "Pavlos Fafalios and Vasileios Iosifidis and Eirini Ntoutsi and Stefan Dietze",

note = "Funding information:. The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233 and the H2020 Grant No. 687916 (AFEL project), and by the German Research Foundation (DFG) project OSCAR (Opinion Stream Classification with Ensembles and Active leaRners). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233 and the H2020 Grant No. 687916 (AFEL project), and by the German Research Foundation (DFG) project OSCAR (Opinion Stream Classification with Ensembles and Active leaRners).; 15th International Conference on Extended Semantic Web Conference, ESWC 2018 ; Conference date: 03-06-2018 Through 07-06-2018",

year = "2018",

month = jun,

day = "3",

doi = "10.1007/978-3-319-93417-4_12",

language = "English",

isbn = "9783319934167",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "177--190",

editor = "Aldo Gangemi and Rapha{\"e}l Troncy and Roberto Navigli and Laura Hollink and Maria-Esther Vidal and Pascal Hitzler and Anna Tordai and Mehwish Alam",

booktitle = "The Semantic Web",

address = "Germany",

}

Download

TY - GEN

T1 - TweetsKB

T2 - 15th International Conference on Extended Semantic Web Conference, ESWC 2018

AU - Fafalios, Pavlos

AU - Iosifidis, Vasileios

AU - Ntoutsi, Eirini

AU - Dietze, Stefan

N1 - Funding information:. The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233 and the H2020 Grant No. 687916 (AFEL project), and by the German Research Foundation (DFG) project OSCAR (Opinion Stream Classification with Ensembles and Active leaRners). The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233 and the H2020 Grant No. 687916 (AFEL project), and by the German Research Foundation (DFG) project OSCAR (Opinion Stream Classification with Ensembles and Active leaRners).

PY - 2018/6/3

Y1 - 2018/6/3

N2 - Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan’13–Nov’17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.

AB - Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan’13–Nov’17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.

KW - Entity linking

KW - RDF

KW - Sentiment analysis

KW - Social media archives

KW - Twitter

UR - http://www.scopus.com/inward/record.url?scp=85048487300&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-93417-4_12

DO - 10.1007/978-3-319-93417-4_12

M3 - Conference contribution

AN - SCOPUS:85048487300

SN - 9783319934167

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 177

EP - 190

BT - The Semantic Web

A2 - Gangemi, Aldo

A2 - Troncy, Raphaël

A2 - Navigli, Roberto

A2 - Hollink, Laura

A2 - Vidal, Maria-Esther

A2 - Hitzler, Pascal

A2 - Tordai, Anna

A2 - Alam, Mehwish

PB - Springer Verlag

Y2 - 3 June 2018 through 7 June 2018

ER -

Research@Leibniz University