Optimization of an Image-Based Talking Head System

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer174192
FachzeitschriftEurasip Journal on Audio, Speech, and Music Processing
Jahrgang2009
PublikationsstatusVeröffentlicht - 30 Sept. 2009

Abstract

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

ASJC Scopus Sachgebiete

Zitieren

Optimization of an Image-Based Talking Head System. / Liu, Kang; Ostermann, Joern.
in: Eurasip Journal on Audio, Speech, and Music Processing, Jahrgang 2009, 174192, 30.09.2009.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Download
@article{6ce1c8d56b77445ea281b797be8257ed,
title = "Optimization of an Image-Based Talking Head System",
abstract = "This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.",
author = "Kang Liu and Joern Ostermann",
note = "Funding information: This research work was funded by EC within FP6 under Grant 511568 with the acronym 3DTV. The authors acknowledge Holger Blume for his support with the Pareto optimization software. The authors would like to thank Tobias Elbrandt for his helpful comments and suggestions in the evaluation of the subjective tests. The authors also wish to thank all the people involved in the subjective tests.",
year = "2009",
month = sep,
day = "30",
doi = "10.1155/2009/174192",
language = "English",
volume = "2009",
journal = "Eurasip Journal on Audio, Speech, and Music Processing",
issn = "1687-4714",
publisher = "Springer Publishing Company",

}

Download

TY - JOUR

T1 - Optimization of an Image-Based Talking Head System

AU - Liu, Kang

AU - Ostermann, Joern

N1 - Funding information: This research work was funded by EC within FP6 under Grant 511568 with the acronym 3DTV. The authors acknowledge Holger Blume for his support with the Pareto optimization software. The authors would like to thank Tobias Elbrandt for his helpful comments and suggestions in the evaluation of the subjective tests. The authors also wish to thank all the people involved in the subjective tests.

PY - 2009/9/30

Y1 - 2009/9/30

N2 - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

AB - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.

UR - http://www.scopus.com/inward/record.url?scp=70350007964&partnerID=8YFLogxK

U2 - 10.1155/2009/174192

DO - 10.1155/2009/174192

M3 - Article

AN - SCOPUS:70350007964

VL - 2009

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

SN - 1687-4714

M1 - 174192

ER -

Von denselben Autoren