Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Research and Advanced Technology for Digital Libraries - 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Proceedings |
Herausgeber/-innen | Yannis Manolopoulos, Jaap Kamps, Giannis Tsakonas, Lazaros Iliadis, Ioannis Karydis |
Herausgeber (Verlag) | Springer Verlag |
Seiten | 61-73 |
Seitenumfang | 13 |
ISBN (Print) | 9783319670072 |
Publikationsstatus | Veröffentlicht - 2 Sept. 2017 |
Veranstaltung | 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017 - Thessaloniki, Griechenland Dauer: 18 Sept. 2017 → 21 Sept. 2017 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 10450 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
Temporally annotated corpora about historic events can be crucial to digital humanities research: they allow to extract and date events as well as reactions to them, and to construct timelines of events and of language use, among other applications. However, producing a precise corpus of a particular event in history is very challenging due to the lack of noise-free digitalized data. This paper introduces RussianFlu-DE, a temporally annotated corpus of 639 articles extracted from noisy OCR text of newspaper issues in German. All articles are about the Russian flu epidemic that took place during 1889–1893. We describe the development of RussianFlu-DE, including methods to clean different types of noise in the OCR text, and our tool for extracting Russian flu related articles. In addition, the task of temporal annotation using the TIMEX2 schema is discussed and the characteristics of the corpus compared to other corpora are presented. To show how our contribution supports epidemiology, we present some preliminary yet interesting results obtained from analyzing the articles in RussianFlu-DE. The corpus and associated tools for exploration are publicly available.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Research and Advanced Technology for Digital Libraries - 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Proceedings. Hrsg. / Yannis Manolopoulos; Jaap Kamps; Giannis Tsakonas; Lazaros Iliadis; Ioannis Karydis. Springer Verlag, 2017. S. 61-73 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 10450 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - RussianFlu-DE
T2 - 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017
AU - van Canh, Tran
AU - Markert, Katja
AU - Nejdl, Wolfgang
N1 - Funding information: Acknowledgments. This work is supported by the German Research Foundation (DFG) for the project “Tracking the Russian Flu in U.S. and German Medical and Popular Reports, 1889–1893” on Grant No. NE 638/13-1. We also thank you the Austrian National Library for help in data collection.
PY - 2017/9/2
Y1 - 2017/9/2
N2 - Temporally annotated corpora about historic events can be crucial to digital humanities research: they allow to extract and date events as well as reactions to them, and to construct timelines of events and of language use, among other applications. However, producing a precise corpus of a particular event in history is very challenging due to the lack of noise-free digitalized data. This paper introduces RussianFlu-DE, a temporally annotated corpus of 639 articles extracted from noisy OCR text of newspaper issues in German. All articles are about the Russian flu epidemic that took place during 1889–1893. We describe the development of RussianFlu-DE, including methods to clean different types of noise in the OCR text, and our tool for extracting Russian flu related articles. In addition, the task of temporal annotation using the TIMEX2 schema is discussed and the characteristics of the corpus compared to other corpora are presented. To show how our contribution supports epidemiology, we present some preliminary yet interesting results obtained from analyzing the articles in RussianFlu-DE. The corpus and associated tools for exploration are publicly available.
AB - Temporally annotated corpora about historic events can be crucial to digital humanities research: they allow to extract and date events as well as reactions to them, and to construct timelines of events and of language use, among other applications. However, producing a precise corpus of a particular event in history is very challenging due to the lack of noise-free digitalized data. This paper introduces RussianFlu-DE, a temporally annotated corpus of 639 articles extracted from noisy OCR text of newspaper issues in German. All articles are about the Russian flu epidemic that took place during 1889–1893. We describe the development of RussianFlu-DE, including methods to clean different types of noise in the OCR text, and our tool for extracting Russian flu related articles. In addition, the task of temporal annotation using the TIMEX2 schema is discussed and the characteristics of the corpus compared to other corpora are presented. To show how our contribution supports epidemiology, we present some preliminary yet interesting results obtained from analyzing the articles in RussianFlu-DE. The corpus and associated tools for exploration are publicly available.
KW - Corpus in German
KW - Russian flu epidemic
KW - Temporal annotation
KW - TIMEX2
UR - http://www.scopus.com/inward/record.url?scp=85029576750&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-67008-9_6
DO - 10.1007/978-3-319-67008-9_6
M3 - Conference contribution
AN - SCOPUS:85029576750
SN - 9783319670072
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 61
EP - 73
BT - Research and Advanced Technology for Digital Libraries - 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Proceedings
A2 - Manolopoulos, Yannis
A2 - Kamps, Jaap
A2 - Tsakonas, Giannis
A2 - Iliadis, Lazaros
A2 - Karydis, Ioannis
PB - Springer Verlag
Y2 - 18 September 2017 through 21 September 2017
ER -