Details
Original language | English |
---|---|
Pages (from-to) | 847-869 |
Number of pages | 23 |
Journal | Multimedia systems |
Volume | 29 |
Issue number | 2 |
Early online date | 21 Nov 2022 |
Publication status | Published - Apr 2023 |
Abstract
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
Keywords
- Convolutional neural networks, Cultural heritage, Deep learning, Image classification, Multilingual, Multimodal, Text classification, Transformer
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Information Systems
- Engineering(all)
- Media Technology
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Multimedia systems, Vol. 29, No. 2, 04.2023, p. 847-869.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Multimodal metadata assignment for cultural heritage artifacts
AU - Rei, Luis
AU - Mladenic, Dunja
AU - Dorozynski, Mareike
AU - Rottensteiner, Franz
AU - Schleider, Thomas
AU - Troncy, Raphaël
AU - Lozano, Jorge Sebastián
AU - Salvatella, Mar Gaitán
N1 - Funding Information: This work was supported by the Slovenian Research Agency and the European Union’s Horizon 2020 research and innovation program under SILKNOW grant agreement No. 769504.
PY - 2023/4
Y1 - 2023/4
N2 - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
AB - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
KW - Convolutional neural networks
KW - Cultural heritage
KW - Deep learning
KW - Image classification
KW - Multilingual
KW - Multimodal
KW - Text classification
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85142285336&partnerID=8YFLogxK
U2 - 10.21203/rs.3.rs-1708875/v1
DO - 10.21203/rs.3.rs-1708875/v1
M3 - Article
AN - SCOPUS:85142285336
VL - 29
SP - 847
EP - 869
JO - Multimedia systems
JF - Multimedia systems
SN - 0942-4962
IS - 2
ER -