Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 847-869 |
Seitenumfang | 23 |
Fachzeitschrift | Multimedia systems |
Jahrgang | 29 |
Ausgabenummer | 2 |
Frühes Online-Datum | 21 Nov. 2022 |
Publikationsstatus | Veröffentlicht - Apr. 2023 |
Abstract
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Software
- Informatik (insg.)
- Information systems
- Ingenieurwesen (insg.)
- Medientechnik
- Informatik (insg.)
- Hardware und Architektur
- Informatik (insg.)
- Computernetzwerke und -kommunikation
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Multimedia systems, Jahrgang 29, Nr. 2, 04.2023, S. 847-869.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Multimodal metadata assignment for cultural heritage artifacts
AU - Rei, Luis
AU - Mladenic, Dunja
AU - Dorozynski, Mareike
AU - Rottensteiner, Franz
AU - Schleider, Thomas
AU - Troncy, Raphaël
AU - Lozano, Jorge Sebastián
AU - Salvatella, Mar Gaitán
N1 - Funding Information: This work was supported by the Slovenian Research Agency and the European Union’s Horizon 2020 research and innovation program under SILKNOW grant agreement No. 769504.
PY - 2023/4
Y1 - 2023/4
N2 - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
AB - We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
KW - Convolutional neural networks
KW - Cultural heritage
KW - Deep learning
KW - Image classification
KW - Multilingual
KW - Multimodal
KW - Text classification
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85142285336&partnerID=8YFLogxK
U2 - 10.21203/rs.3.rs-1708875/v1
DO - 10.21203/rs.3.rs-1708875/v1
M3 - Article
AN - SCOPUS:85142285336
VL - 29
SP - 847
EP - 869
JO - Multimedia systems
JF - Multimedia systems
SN - 0942-4962
IS - 2
ER -