Optimization of an Image-Based Talking Head System

Kang Liu; Joern Ostermann

doi:10.1155/2009/174192

Details

Originalsprache	Englisch
Aufsatznummer	174192
Fachzeitschrift	Eurasip Journal on Audio, Speech, and Music Processing
Jahrgang	2009
Publikationsstatus	Veröffentlicht - 30 Sept. 2009

Abstract

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

ASJC Scopus Sachgebiete

Physik und Astronomie (insg.)
Akustik und Ultraschall
Ingenieurwesen (insg.)
Elektrotechnik und Elektronik

Zitieren

Optimization of an Image-Based Talking Head System. / Liu, Kang; Ostermann, Joern.
in: Eurasip Journal on Audio, Speech, and Music Processing, Jahrgang 2009, 174192, 30.09.2009.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Liu, K & Ostermann, J 2009, 'Optimization of an Image-Based Talking Head System', Eurasip Journal on Audio, Speech, and Music Processing, Jg. 2009, 174192. https://doi.org/10.1155/2009/174192

Liu, K., & Ostermann, J. (2009). Optimization of an Image-Based Talking Head System. Eurasip Journal on Audio, Speech, and Music Processing, 2009, Artikel 174192. https://doi.org/10.1155/2009/174192

Liu K, Ostermann J. Optimization of an Image-Based Talking Head System. Eurasip Journal on Audio, Speech, and Music Processing. 2009 Sep 30;2009:174192. doi: 10.1155/2009/174192

Liu, Kang ; Ostermann, Joern. / Optimization of an Image-Based Talking Head System. in: Eurasip Journal on Audio, Speech, and Music Processing. 2009 ; Jahrgang 2009.

Download

@article{6ce1c8d56b77445ea281b797be8257ed,

title = "Optimization of an Image-Based Talking Head System",

abstract = "This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.",

author = "Kang Liu and Joern Ostermann",

note = "Funding information: This research work was funded by EC within FP6 under Grant 511568 with the acronym 3DTV. The authors acknowledge Holger Blume for his support with the Pareto optimization software. The authors would like to thank Tobias Elbrandt for his helpful comments and suggestions in the evaluation of the subjective tests. The authors also wish to thank all the people involved in the subjective tests.",

year = "2009",

month = sep,

day = "30",

doi = "10.1155/2009/174192",

language = "English",

volume = "2009",

journal = "Eurasip Journal on Audio, Speech, and Music Processing",

issn = "1687-4714",

publisher = "Springer Publishing Company",

}

Download

TY - JOUR

T1 - Optimization of an Image-Based Talking Head System

AU - Liu, Kang

AU - Ostermann, Joern

N1 - Funding information: This research work was funded by EC within FP6 under Grant 511568 with the acronym 3DTV. The authors acknowledge Holger Blume for his support with the Pareto optimization software. The authors would like to thank Tobias Elbrandt for his helpful comments and suggestions in the evaluation of the subjective tests. The authors also wish to thank all the people involved in the subjective tests.

PY - 2009/9/30

Y1 - 2009/9/30

N2 - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

AB - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

UR - http://www.scopus.com/inward/record.url?scp=70350007964&partnerID=8YFLogxK

U2 - 10.1155/2009/174192

DO - 10.1155/2009/174192

M3 - Article

AN - SCOPUS:70350007964

VL - 2009

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

SN - 1687-4714

M1 - 174192

ER -

Research@Leibniz University

Optimization of an Image-Based Talking Head System

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression