Details
Originalsprache | Englisch |
---|---|
Aufsatznummer | 113135 |
Fachzeitschrift | Expert Systems with Applications |
Jahrgang | 147 |
Frühes Online-Datum | 18 Dez. 2019 |
Publikationsstatus | Veröffentlicht - 1 Juni 2020 |
Extern publiziert | Ja |
Abstract
Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
ASJC Scopus Sachgebiete
- Ingenieurwesen (insg.)
- Allgemeiner Maschinenbau
- Informatik (insg.)
- Angewandte Informatik
- Informatik (insg.)
- Artificial intelligence
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Expert Systems with Applications, Jahrgang 147, 113135, 01.06.2020.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - IOTA
T2 - Interlinking of heterogeneous multilingual open fiscal DaTA
AU - Musyaffa, Fathoni A.
AU - Vidal, Maria Esther
AU - Orlandi, Fabrizio
AU - Lehmann, Jens
AU - Jabeen, Hajira
N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
KW - Budget and spending data
KW - Cluster computing
KW - Data interlinking
KW - Open data
KW - String similarity measure
KW - Translated string matching framework
UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.113135
DO - 10.1016/j.eswa.2019.113135
M3 - Article
AN - SCOPUS:85078209988
VL - 147
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
M1 - 113135
ER -