IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Fathoni A. Musyaffa
  • Maria Esther Vidal
  • Fabrizio Orlandi
  • Jens Lehmann
  • Hajira Jabeen

Externe Organisationen

  • Rheinische Friedrich-Wilhelms-Universität Bonn
  • Universidad Simon Bolivar
  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
  • Trinity College Dublin
  • Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer113135
FachzeitschriftExpert Systems with Applications
Jahrgang147
Frühes Online-Datum18 Dez. 2019
PublikationsstatusVeröffentlicht - 1 Juni 2020
Extern publiziertJa

Abstract

Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

ASJC Scopus Sachgebiete

Zitieren

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. / Musyaffa, Fathoni A.; Vidal, Maria Esther; Orlandi, Fabrizio et al.
in: Expert Systems with Applications, Jahrgang 147, 113135, 01.06.2020.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Musyaffa, F. A., Vidal, M. E., Orlandi, F., Lehmann, J., & Jabeen, H. (2020). IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications, 147, Artikel 113135. https://doi.org/10.1016/j.eswa.2019.113135
Musyaffa FA, Vidal ME, Orlandi F, Lehmann J, Jabeen H. IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications. 2020 Jun 1;147:113135. Epub 2019 Dez 18. doi: 10.1016/j.eswa.2019.113135
Musyaffa, Fathoni A. ; Vidal, Maria Esther ; Orlandi, Fabrizio et al. / IOTA : Interlinking of heterogeneous multilingual open fiscal DaTA. in: Expert Systems with Applications. 2020 ; Jahrgang 147.
Download
@article{d01be6fbdc8b4afa894f37a5e54da19a,
title = "IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA",
abstract = "Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.",
keywords = "Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework",
author = "Musyaffa, {Fathoni A.} and Vidal, {Maria Esther} and Fabrizio Orlandi and Jens Lehmann and Hajira Jabeen",
note = "Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.",
year = "2020",
month = jun,
day = "1",
doi = "10.1016/j.eswa.2019.113135",
language = "English",
volume = "147",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - IOTA

T2 - Interlinking of heterogeneous multilingual open fiscal DaTA

AU - Musyaffa, Fathoni A.

AU - Vidal, Maria Esther

AU - Orlandi, Fabrizio

AU - Lehmann, Jens

AU - Jabeen, Hajira

N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

KW - Budget and spending data

KW - Cluster computing

KW - Data interlinking

KW - Open data

KW - String similarity measure

KW - Translated string matching framework

UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.113135

DO - 10.1016/j.eswa.2019.113135

M3 - Article

AN - SCOPUS:85078209988

VL - 147

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 113135

ER -