IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Fathoni A. Musyaffa; Maria Esther Vidal; Fabrizio Orlandi; Jens Lehmann; Hajira Jabeen

doi:10.1016/j.eswa.2019.113135

Details

Originalsprache	Englisch
Aufsatznummer	113135
Fachzeitschrift	Expert Systems with Applications
Jahrgang	147
Frühes Online-Datum	18 Dez. 2019
Publikationsstatus	Veröffentlicht - 1 Juni 2020
Extern publiziert	Ja

Abstract

Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

ASJC Scopus Sachgebiete

Ingenieurwesen (insg.)
Allgemeiner Maschinenbau
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Artificial intelligence

Zitieren

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. / Musyaffa, Fathoni A.; Vidal, Maria Esther; Orlandi, Fabrizio et al.
in: Expert Systems with Applications, Jahrgang 147, 113135, 01.06.2020.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Musyaffa, FA, Vidal, ME, Orlandi, F, Lehmann, J & Jabeen, H 2020, 'IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA', Expert Systems with Applications, Jg. 147, 113135. https://doi.org/10.1016/j.eswa.2019.113135

Musyaffa, F. A., Vidal, M. E., Orlandi, F., Lehmann, J., & Jabeen, H. (2020). IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications, 147, Artikel 113135. https://doi.org/10.1016/j.eswa.2019.113135

Musyaffa FA, Vidal ME, Orlandi F, Lehmann J, Jabeen H. IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA. Expert Systems with Applications. 2020 Jun 1;147:113135. Epub 2019 Dez 18. doi: 10.1016/j.eswa.2019.113135

Musyaffa, Fathoni A. ; Vidal, Maria Esther ; Orlandi, Fabrizio et al. / IOTA : Interlinking of heterogeneous multilingual open fiscal DaTA. in: Expert Systems with Applications. 2020 ; Jahrgang 147.

Download

@article{d01be6fbdc8b4afa894f37a5e54da19a,

title = "IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA",

abstract = "Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.",

keywords = "Budget and spending data, Cluster computing, Data interlinking, Open data, String similarity measure, Translated string matching framework",

author = "Musyaffa, {Fathoni A.} and Vidal, {Maria Esther} and Fabrizio Orlandi and Jens Lehmann and Hajira Jabeen",

note = "Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.",

year = "2020",

month = jun,

day = "1",

doi = "10.1016/j.eswa.2019.113135",

language = "English",

volume = "147",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

Download

TY - JOUR

T1 - IOTA

T2 - Interlinking of heterogeneous multilingual open fiscal DaTA

AU - Musyaffa, Fathoni A.

AU - Vidal, Maria Esther

AU - Orlandi, Fabrizio

AU - Lehmann, Jens

AU - Jabeen, Hajira

N1 - Funding information: The first author would like to express gratitude to the German Academic Exchange Service (DAAD) for supporting him with a PhD scholarship.

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

AB - Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

KW - Budget and spending data

KW - Cluster computing

KW - Data interlinking

KW - Open data

KW - String similarity measure

KW - Translated string matching framework

UR - http://www.scopus.com/inward/record.url?scp=85078209988&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.113135

DO - 10.1016/j.eswa.2019.113135

M3 - Article

AN - SCOPUS:85078209988

VL - 147

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 113135

ER -

Research@Leibniz University

IOTA: Interlinking of heterogeneous multilingual open fiscal DaTA

Autoren

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren