IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Fathoni A. Musyaffa; Maria Esther Vidal; Fabrizio Orlandi; Jens Lehmann; Hajira Jabeen

doi:10.1016/j.eswa.2019.113135

Details

Original language	English
Article number	113135
Journal	Expert Systems with Applications
Volume	147
Early online date	18 Dec 2019
Publication status	Published - 1 Jun 2020
Externally published	Yes

Abstract

Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

Keywords

Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework

ASJC Scopus subject areas

Engineering(all)
General Engineering
Computer Science(all)
Computer Science Applications
Computer Science(all)
Artificial Intelligence

Cite this

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. / Musyaffa, Fathoni A.; Vidal, Maria Esther; Orlandi, Fabrizio et al.
In: Expert Systems with Applications, Vol. 147, 113135, 01.06.2020.

Research output: Contribution to journal › Article › Research › peer review

Musyaffa, FA, Vidal, ME, Orlandi, F, Lehmann, J & Jabeen, H 2020, 'IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA', Expert Systems with Applications, vol. 147, 113135. https://doi.org/10.1016/j.eswa.2019.113135

Musyaffa, F. A., Vidal, M. E., Orlandi, F., Lehmann, J., & Jabeen, H. (2020). IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications, 147, Article 113135. https://doi.org/10.1016/j.eswa.2019.113135

Musyaffa FA, Vidal ME, Orlandi F, Lehmann J, Jabeen H. IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications. 2020 Jun 1;147:113135. Epub 2019 Dec 18. doi: 10.1016/j.eswa.2019.113135

Musyaffa, Fathoni A. ; Vidal, Maria Esther ; Orlandi, Fabrizio et al. / IOTA : Interlinking of heterogeneous multilingual open fiscal DaTA. In: Expert Systems with Applications. 2020 ; Vol. 147.

Download

@article{d01be6fbdc8b4afa894f37a5e54da19a,

title = "IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA",

abstract = "Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.",

keywords = "Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework",

author = "Musyaffa, {Fathoni A.} and Vidal, {Maria Esther} and Fabrizio Orlandi and Jens Lehmann and Hajira Jabeen",

note = "Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.",

year = "2020",

month = jun,

day = "1",

doi = "10.1016/j.eswa.2019.113135",

language = "English",

volume = "147",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - IOTA

T2 - Interlinking of heterogeneous multilingual open fiscal DaTA

AU - Musyaffa, Fathoni A.

AU - Vidal, Maria Esther

AU - Orlandi, Fabrizio

AU - Lehmann, Jens

AU - Jabeen, Hajira

N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

KW - Budget and spending data

KW - Cluster computing

KW - Data interlinking

KW - Open data

KW - String similarity measure

KW - Translated string matching framework

UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.113135

DO - 10.1016/j.eswa.2019.113135

M3 - Article

AN - SCOPUS:85078209988

VL - 147

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 113135

ER -

Research@Leibniz University

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Authors

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this