IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Fathoni A. Musyaffa
  • Maria Esther Vidal
  • Fabrizio Orlandi
  • Jens Lehmann
  • Hajira Jabeen

External Research Organisations

  • University of Bonn
  • Universidad Simon Bolivar
  • German National Library of Science and Technology (TIB)
  • Trinity College Dublin
  • Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
View graph of relations

Details

Original languageEnglish
Article number113135
JournalExpert Systems with Applications
Volume147
Early online date18 Dec 2019
Publication statusPublished - 1 Jun 2020
Externally publishedYes

Abstract

Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

Keywords

    Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework

ASJC Scopus subject areas

Cite this

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. / Musyaffa, Fathoni A.; Vidal, Maria Esther; Orlandi, Fabrizio et al.
In: Expert Systems with Applications, Vol. 147, 113135, 01.06.2020.

Research output: Contribution to journalArticleResearchpeer review

Musyaffa, F. A., Vidal, M. E., Orlandi, F., Lehmann, J., & Jabeen, H. (2020). IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications, 147, Article 113135. https://doi.org/10.1016/j.eswa.2019.113135
Musyaffa FA, Vidal ME, Orlandi F, Lehmann J, Jabeen H. IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications. 2020 Jun 1;147:113135. Epub 2019 Dec 18. doi: 10.1016/j.eswa.2019.113135
Musyaffa, Fathoni A. ; Vidal, Maria Esther ; Orlandi, Fabrizio et al. / IOTA : Interlinking of heterogeneous multilingual open fiscal DaTA. In: Expert Systems with Applications. 2020 ; Vol. 147.
Download
@article{d01be6fbdc8b4afa894f37a5e54da19a,
title = "IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA",
abstract = "Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.",
keywords = "Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework",
author = "Musyaffa, {Fathoni A.} and Vidal, {Maria Esther} and Fabrizio Orlandi and Jens Lehmann and Hajira Jabeen",
note = "Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.",
year = "2020",
month = jun,
day = "1",
doi = "10.1016/j.eswa.2019.113135",
language = "English",
volume = "147",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - IOTA

T2 - Interlinking of heterogeneous multilingual open fiscal DaTA

AU - Musyaffa, Fathoni A.

AU - Vidal, Maria Esther

AU - Orlandi, Fabrizio

AU - Lehmann, Jens

AU - Jabeen, Hajira

N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

KW - Budget and spending data

KW - Cluster computing

KW - Data interlinking

KW - Open data

KW - String similarity measure

KW - Translated string matching framework

UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.113135

DO - 10.1016/j.eswa.2019.113135

M3 - Article

AN - SCOPUS:85078209988

VL - 147

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 113135

ER -