Multimodal metadata assignment for cultural heritage artifacts

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Luis Rei
  • Dunja Mladenic
  • Mareike Dorozynski
  • Franz Rottensteiner
  • Thomas Schleider
  • Raphaël Troncy
  • Jorge Sebastián Lozano
  • Mar Gaitán Salvatella

External Research Organisations

  • Jožef Stefan Institute (JSI)
  • EURECOM - Graduate School and Research Center in Digital Sciences
  • Universitat de Valencia
View graph of relations

Details

Original languageEnglish
Pages (from-to)847-869
Number of pages23
JournalMultimedia systems
Volume29
Issue number2
Early online date21 Nov 2022
Publication statusPublished - Apr 2023

Abstract

We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.

Keywords

    Convolutional neural networks, Cultural heritage, Deep learning, Image classification, Multilingual, Multimodal, Text classification, Transformer

ASJC Scopus subject areas

Cite this

Multimodal metadata assignment for cultural heritage artifacts. / Rei, Luis; Mladenic, Dunja; Dorozynski, Mareike et al.
In: Multimedia systems, Vol. 29, No. 2, 04.2023, p. 847-869.

Research output: Contribution to journalArticleResearchpeer review

Rei, L, Mladenic, D, Dorozynski, M, Rottensteiner, F, Schleider, T, Troncy, R, Lozano, JS & Salvatella, MG 2023, 'Multimodal metadata assignment for cultural heritage artifacts', Multimedia systems, vol. 29, no. 2, pp. 847-869. https://doi.org/10.21203/rs.3.rs-1708875/v1, https://doi.org/10.1007/s00530-022-01025-2
Rei, L., Mladenic, D., Dorozynski, M., Rottensteiner, F., Schleider, T., Troncy, R., Lozano, J. S., & Salvatella, M. G. (2023). Multimodal metadata assignment for cultural heritage artifacts. Multimedia systems, 29(2), 847-869. https://doi.org/10.21203/rs.3.rs-1708875/v1, https://doi.org/10.1007/s00530-022-01025-2
Rei L, Mladenic D, Dorozynski M, Rottensteiner F, Schleider T, Troncy R et al. Multimodal metadata assignment for cultural heritage artifacts. Multimedia systems. 2023 Apr;29(2):847-869. Epub 2022 Nov 21. doi: 10.21203/rs.3.rs-1708875/v1, 10.1007/s00530-022-01025-2
Rei, Luis ; Mladenic, Dunja ; Dorozynski, Mareike et al. / Multimodal metadata assignment for cultural heritage artifacts. In: Multimedia systems. 2023 ; Vol. 29, No. 2. pp. 847-869.
Download
@article{a09b09849af44f7599f57baa14e242fd,
title = "Multimodal metadata assignment for cultural heritage artifacts",
abstract = "We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.",
keywords = "Convolutional neural networks, Cultural heritage, Deep learning, Image classification, Multilingual, Multimodal, Text classification, Transformer",
author = "Luis Rei and Dunja Mladenic and Mareike Dorozynski and Franz Rottensteiner and Thomas Schleider and Rapha{\"e}l Troncy and Lozano, {Jorge Sebasti{\'a}n} and Salvatella, {Mar Gait{\'a}n}",
note = "Funding Information: This work was supported by the Slovenian Research Agency and the European Union{\textquoteright}s Horizon 2020 research and innovation program under SILKNOW grant agreement No. 769504. ",
year = "2023",
month = apr,
doi = "10.21203/rs.3.rs-1708875/v1",
language = "English",
volume = "29",
pages = "847--869",
journal = "Multimedia systems",
issn = "0942-4962",
publisher = "Springer Verlag",
number = "2",

}

Download

TY - JOUR

T1 - Multimodal metadata assignment for cultural heritage artifacts

AU - Rei, Luis

AU - Mladenic, Dunja

AU - Dorozynski, Mareike

AU - Rottensteiner, Franz

AU - Schleider, Thomas

AU - Troncy, Raphaël

AU - Lozano, Jorge Sebastián

AU - Salvatella, Mar Gaitán

N1 - Funding Information: This work was supported by the Slovenian Research Agency and the European Union’s Horizon 2020 research and innovation program under SILKNOW grant agreement No. 769504.

PY - 2023/4

Y1 - 2023/4

N2 - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.

AB - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.

KW - Convolutional neural networks

KW - Cultural heritage

KW - Deep learning

KW - Image classification

KW - Multilingual

KW - Multimodal

KW - Text classification

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85142285336&partnerID=8YFLogxK

U2 - 10.21203/rs.3.rs-1708875/v1

DO - 10.21203/rs.3.rs-1708875/v1

M3 - Article

AN - SCOPUS:85142285336

VL - 29

SP - 847

EP - 869

JO - Multimedia systems

JF - Multimedia systems

SN - 0942-4962

IS - 2

ER -