Details
Originalsprache | Englisch |
---|---|
Aufsatznummer | 174192 |
Fachzeitschrift | Eurasip Journal on Audio, Speech, and Music Processing |
Jahrgang | 2009 |
Publikationsstatus | Veröffentlicht - 30 Sept. 2009 |
Abstract
This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.
ASJC Scopus Sachgebiete
- Physik und Astronomie (insg.)
- Akustik und Ultraschall
- Ingenieurwesen (insg.)
- Elektrotechnik und Elektronik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: Eurasip Journal on Audio, Speech, and Music Processing, Jahrgang 2009, 174192, 30.09.2009.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Optimization of an Image-Based Talking Head System
AU - Liu, Kang
AU - Ostermann, Joern
N1 - Funding information: This research work was funded by EC within FP6 under Grant 511568 with the acronym 3DTV. The authors acknowledge Holger Blume for his support with the Pareto optimization software. The authors would like to thank Tobias Elbrandt for his helpful comments and suggestions in the evaluation of the subjective tests. The authors also wish to thank all the people involved in the subjective tests.
PY - 2009/9/30
Y1 - 2009/9/30
N2 - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.
AB - This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.
UR - http://www.scopus.com/inward/record.url?scp=70350007964&partnerID=8YFLogxK
U2 - 10.1155/2009/174192
DO - 10.1155/2009/174192
M3 - Article
AN - SCOPUS:70350007964
VL - 2009
JO - Eurasip Journal on Audio, Speech, and Music Processing
JF - Eurasip Journal on Audio, Speech, and Music Processing
SN - 1687-4714
M1 - 174192
ER -