Details
Original language | English |
---|---|
Title of host publication | IEEE Winter Conference on Applications of Computer Vision |
Subtitle of host publication | WACV 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 7271-7280 |
Number of pages | 10 |
ISBN (electronic) | 9798350318920 |
ISBN (print) | 979-8-3503-1893-7 |
Publication status | Published - 2024 |
Event | IEEE/CVF Winter Conference on Applications of Computer Vision 2024 - Waikoloa, United States Duration: 3 Jan 2024 → 8 Jan 2024 |
Abstract
Event classification in images plays a vital role in multimedia analysis especially with the prevalence of fake news on social media and the Web. The majority of approaches for event classification rely on large sets of labeled training data. However, image labels for fine-grained event instances (e.g., 2016 Summer Olympics) can be sparse, incorrect, ambiguous, etc. A few approaches have addressed the lack of labeled data for event classification but cover only few events. Moreover, vision-language models that allow for zero-shot and few-shot classification with prompting have not yet been extensively exploited. In this paper, we propose four different techniques to create hard prompts including knowledge graph information from Wikidata and Wikipedia as well as an ensemble approach for zero-shot event classification. We also integrate prompt learning for state-of-the-art vision-language models to address few-shot event classification. Experimental results on six benchmarks including a new dataset comprising event instances from various domains, such as politics and natural disasters, show that our proposed approaches require much fewer training images than supervised baselines and the state-of-the-art while achieving better results.
Keywords
- Algorithms, Applications, Arts / games / social media, Vision + language and/or other modalities
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
IEEE Winter Conference on Applications of Computer Vision: WACV 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 7271-7280.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Few-Shot Event Classification in Images using Knowledge Graphs for Prompting
AU - Tahmasebzadeh, Golsa
AU - Springstein, Matthias
AU - Ewerth, Ralph
AU - Müller-Budack, Eric
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Event classification in images plays a vital role in multimedia analysis especially with the prevalence of fake news on social media and the Web. The majority of approaches for event classification rely on large sets of labeled training data. However, image labels for fine-grained event instances (e.g., 2016 Summer Olympics) can be sparse, incorrect, ambiguous, etc. A few approaches have addressed the lack of labeled data for event classification but cover only few events. Moreover, vision-language models that allow for zero-shot and few-shot classification with prompting have not yet been extensively exploited. In this paper, we propose four different techniques to create hard prompts including knowledge graph information from Wikidata and Wikipedia as well as an ensemble approach for zero-shot event classification. We also integrate prompt learning for state-of-the-art vision-language models to address few-shot event classification. Experimental results on six benchmarks including a new dataset comprising event instances from various domains, such as politics and natural disasters, show that our proposed approaches require much fewer training images than supervised baselines and the state-of-the-art while achieving better results.
AB - Event classification in images plays a vital role in multimedia analysis especially with the prevalence of fake news on social media and the Web. The majority of approaches for event classification rely on large sets of labeled training data. However, image labels for fine-grained event instances (e.g., 2016 Summer Olympics) can be sparse, incorrect, ambiguous, etc. A few approaches have addressed the lack of labeled data for event classification but cover only few events. Moreover, vision-language models that allow for zero-shot and few-shot classification with prompting have not yet been extensively exploited. In this paper, we propose four different techniques to create hard prompts including knowledge graph information from Wikidata and Wikipedia as well as an ensemble approach for zero-shot event classification. We also integrate prompt learning for state-of-the-art vision-language models to address few-shot event classification. Experimental results on six benchmarks including a new dataset comprising event instances from various domains, such as politics and natural disasters, show that our proposed approaches require much fewer training images than supervised baselines and the state-of-the-art while achieving better results.
KW - Algorithms
KW - Applications
KW - Arts / games / social media
KW - Vision + language and/or other modalities
UR - http://www.scopus.com/inward/record.url?scp=85191986086&partnerID=8YFLogxK
U2 - 10.1109/WACV57701.2024.00712
DO - 10.1109/WACV57701.2024.00712
M3 - Conference contribution
AN - SCOPUS:85191986086
SN - 979-8-3503-1893-7
SP - 7271
EP - 7280
BT - IEEE Winter Conference on Applications of Computer Vision
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE/CVF Winter Conference on Applications of Computer Vision 2024
Y2 - 3 January 2024 through 8 January 2024
ER -