Details
Original language | English |
---|---|
Article number | 113135 |
Journal | Expert Systems with Applications |
Volume | 147 |
Early online date | 18 Dec 2019 |
Publication status | Published - 1 Jun 2020 |
Externally published | Yes |
Abstract
Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
Keywords
- Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework
ASJC Scopus subject areas
- Engineering(all)
- General Engineering
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Expert Systems with Applications, Vol. 147, 113135, 01.06.2020.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - IOTA
T2 - Interlinking of heterogeneous multilingual open fiscal DaTA
AU - Musyaffa, Fathoni A.
AU - Vidal, Maria Esther
AU - Orlandi, Fabrizio
AU - Lehmann, Jens
AU - Jabeen, Hajira
N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.
KW - Budget and spending data
KW - Cluster computing
KW - Data interlinking
KW - Open data
KW - String similarity measure
KW - Translated string matching framework
UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.113135
DO - 10.1016/j.eswa.2019.113135
M3 - Article
AN - SCOPUS:85078209988
VL - 147
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
M1 - 113135
ER -