Realistic facial expression synthesis for an image-based talking head

Kang Liu; Joern Ostermann

doi:10.1109/ICME.2011.6011835

Details

Original language	English
Title of host publication	Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Subtitle of host publication	ICME 2011
Publication status	Published - Sept 2011
Event	2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011 - Barcelona, Spain Duration: 11 Jul 2011 → 15 Jul 2011

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)	1945-7871
ISSN (electronic)	1945-788X

Abstract

This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of videos are recorded: a performer speaking without any expressions, smiling while speaking, and smiling after speaking. By analyzing the recorded audiovisual data, an expressive database is built and contains normalized neutral mouth images and smiling mouth images, as well as their associated features and expressive labels. The expressive talking head is synthesized by an unit selection algorithm, which selects and concatenates appropriate mouth image segments from the expressive database. Experimental results show that the smiles of talking heads are as realistic as the real ones objectively, and the viewers cannot distinguish the real smiles from the synthesized ones.

Keywords

facial expression, image-based animation, Talking head, unit selection

ASJC Scopus subject areas

Computer Science(all)
Computer Networks and Communications
Computer Science(all)
Computer Science Applications

Cite this

Realistic facial expression synthesis for an image-based talking head. / Liu, Kang; Ostermann, Joern.
Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo: ICME 2011. 2011. 6011835 (Proceedings - IEEE International Conference on Multimedia and Expo).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Liu, K & Ostermann, J 2011, Realistic facial expression synthesis for an image-based talking head. in Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo: ICME 2011., 6011835, Proceedings - IEEE International Conference on Multimedia and Expo, 2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011, Barcelona, Spain, 11 Jul 2011. https://doi.org/10.1109/ICME.2011.6011835

Liu, K., & Ostermann, J. (2011). Realistic facial expression synthesis for an image-based talking head. In Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo: ICME 2011 Article 6011835 (Proceedings - IEEE International Conference on Multimedia and Expo). https://doi.org/10.1109/ICME.2011.6011835

Liu K, Ostermann J. Realistic facial expression synthesis for an image-based talking head. In Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo: ICME 2011. 2011. 6011835. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.1109/ICME.2011.6011835

Liu, Kang ; Ostermann, Joern. / Realistic facial expression synthesis for an image-based talking head. Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo: ICME 2011. 2011. (Proceedings - IEEE International Conference on Multimedia and Expo).

Download

@inproceedings{ec540e0aeb6d4e0bba6bba1a9d6e2f95,

title = "Realistic facial expression synthesis for an image-based talking head",

abstract = "This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of videos are recorded: a performer speaking without any expressions, smiling while speaking, and smiling after speaking. By analyzing the recorded audiovisual data, an expressive database is built and contains normalized neutral mouth images and smiling mouth images, as well as their associated features and expressive labels. The expressive talking head is synthesized by an unit selection algorithm, which selects and concatenates appropriate mouth image segments from the expressive database. Experimental results show that the smiles of talking heads are as realistic as the real ones objectively, and the viewers cannot distinguish the real smiles from the synthesized ones.",

keywords = "facial expression, image-based animation, Talking head, unit selection",

author = "Kang Liu and Joern Ostermann",

year = "2011",

month = sep,

doi = "10.1109/ICME.2011.6011835",

language = "English",

isbn = "9781612843490",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

booktitle = "Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo",

note = "2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011 ; Conference date: 11-07-2011 Through 15-07-2011",

}

Download

TY - GEN

T1 - Realistic facial expression synthesis for an image-based talking head

AU - Liu, Kang

AU - Ostermann, Joern

PY - 2011/9

Y1 - 2011/9

N2 - This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of videos are recorded: a performer speaking without any expressions, smiling while speaking, and smiling after speaking. By analyzing the recorded audiovisual data, an expressive database is built and contains normalized neutral mouth images and smiling mouth images, as well as their associated features and expressive labels. The expressive talking head is synthesized by an unit selection algorithm, which selects and concatenates appropriate mouth image segments from the expressive database. Experimental results show that the smiles of talking heads are as realistic as the real ones objectively, and the viewers cannot distinguish the real smiles from the synthesized ones.

AB - This paper presents an image-based talking head system that is able to synthesize realistic facial expressions accompanying speech, given arbitrary text input and control tags of facial expression. As an example of facial expression primitives, smile is used. First, three types of videos are recorded: a performer speaking without any expressions, smiling while speaking, and smiling after speaking. By analyzing the recorded audiovisual data, an expressive database is built and contains normalized neutral mouth images and smiling mouth images, as well as their associated features and expressive labels. The expressive talking head is synthesized by an unit selection algorithm, which selects and concatenates appropriate mouth image segments from the expressive database. Experimental results show that the smiles of talking heads are as realistic as the real ones objectively, and the viewers cannot distinguish the real smiles from the synthesized ones.

KW - facial expression

KW - image-based animation

KW - Talking head

KW - unit selection

UR - http://www.scopus.com/inward/record.url?scp=80155129710&partnerID=8YFLogxK

U2 - 10.1109/ICME.2011.6011835

DO - 10.1109/ICME.2011.6011835

M3 - Conference contribution

AN - SCOPUS:80155129710

SN - 9781612843490

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - Electronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo

T2 - 2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011

Y2 - 11 July 2011 through 15 July 2011

ER -

Research@Leibniz University

Realistic facial expression synthesis for an image-based talking head

Authors

Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data