Two-Stream Aural-Visual Affect Analysis in the Wild

Felix Kuhnke; Lars Rumberg; Jörn Ostermann

doi:10.48550/arXiv.2002.03399

Details

Original language	English
Title of host publication	Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
Editors	Vitomir Struc, Francisco Gomez-Fernandez
Pages	600-605
Number of pages	6
ISBN (electronic)	978-1-7281-3079-8
Publication status	Published - 2020

Publication series

Name	Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020

Abstract

Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.The code is publicly available1.1https://github.com/kuhnkeF/ABAW2020TNT.

Keywords

action units, affective behavior analysis, emotion recognition, expression recognition, human computer interaction, valence arousal

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence
Computer Science(all)
Computer Vision and Pattern Recognition

Cite this

Two-Stream Aural-Visual Affect Analysis in the Wild. / Kuhnke, Felix; Rumberg, Lars; Ostermann, Jörn.
Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020. ed. / Vitomir Struc; Francisco Gomez-Fernandez. 2020. p. 600-605 9320301 (Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Kuhnke, F, Rumberg, L & Ostermann, J 2020, Two-Stream Aural-Visual Affect Analysis in the Wild. in V Struc & F Gomez-Fernandez (eds), Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020., 9320301, Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020, pp. 600-605. https://doi.org/10.48550/arXiv.2002.03399, https://doi.org/10.1109/FG47880.2020.00056

Kuhnke, F., Rumberg, L., & Ostermann, J. (2020). Two-Stream Aural-Visual Affect Analysis in the Wild. In V. Struc, & F. Gomez-Fernandez (Eds.), Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020 (pp. 600-605). Article 9320301 (Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020). https://doi.org/10.48550/arXiv.2002.03399, https://doi.org/10.1109/FG47880.2020.00056

Kuhnke F, Rumberg L, Ostermann J. Two-Stream Aural-Visual Affect Analysis in the Wild. In Struc V, Gomez-Fernandez F, editors, Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020. 2020. p. 600-605. 9320301. (Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020). doi: 10.48550/arXiv.2002.03399, 10.1109/FG47880.2020.00056

Kuhnke, Felix ; Rumberg, Lars ; Ostermann, Jörn. / Two-Stream Aural-Visual Affect Analysis in the Wild. Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020. editor / Vitomir Struc ; Francisco Gomez-Fernandez. 2020. pp. 600-605 (Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020).

Download

@inproceedings{965b9d838aac45b49cef729ee96310e9,

title = "Two-Stream Aural-Visual Affect Analysis in the Wild",

abstract = "Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.The code is publicly available1.1https://github.com/kuhnkeF/ABAW2020TNT.",

keywords = "action units, affective behavior analysis, emotion recognition, expression recognition, human computer interaction, valence arousal",

author = "Felix Kuhnke and Lars Rumberg and J{\"o}rn Ostermann",

year = "2020",

doi = "10.48550/arXiv.2002.03399",

language = "English",

isbn = "978-1-7281-3080-4",

series = "Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020",

pages = "600--605",

editor = "Vitomir Struc and Francisco Gomez-Fernandez",

booktitle = "Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020",

}

Download

TY - GEN

T1 - Two-Stream Aural-Visual Affect Analysis in the Wild

AU - Kuhnke, Felix

AU - Rumberg, Lars

AU - Ostermann, Jörn

PY - 2020

Y1 - 2020

N2 - Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.The code is publicly available1.1https://github.com/kuhnkeF/ABAW2020TNT.

AB - Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.The code is publicly available1.1https://github.com/kuhnkeF/ABAW2020TNT.

KW - action units

KW - affective behavior analysis

KW - emotion recognition

KW - expression recognition

KW - human computer interaction

KW - valence arousal

UR - http://www.scopus.com/inward/record.url?scp=85101440514&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2002.03399

DO - 10.48550/arXiv.2002.03399

M3 - Conference contribution

SN - 978-1-7281-3080-4

T3 - Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020

SP - 600

EP - 605

BT - Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020

A2 - Struc, Vitomir

A2 - Gomez-Fernandez, Francisco

ER -

Research@Leibniz University

Two-Stream Aural-Visual Affect Analysis in the Wild

Authors

Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression