Lifelike talking faces for interactive services

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

Externe Organisationen

  • AT&T Labs
  • Institute of Electrical and Electronics Engineers (IEEE)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)1406-1428
Seitenumfang23
FachzeitschriftProceedings of the IEEE
Jahrgang91
Ausgabenummer9
PublikationsstatusVeröffentlicht - Sept. 2003
Extern publiziertJa

Abstract

Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

ASJC Scopus Sachgebiete

Zitieren

Lifelike talking faces for interactive services. / Cosatto, Eric; Ostermann, Jörn; Graf, Hans Peter et al.
in: Proceedings of the IEEE, Jahrgang 91, Nr. 9, 09.2003, S. 1406-1428.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Cosatto E, Ostermann J, Graf HP, Schroeter J. Lifelike talking faces for interactive services. Proceedings of the IEEE. 2003 Sep;91(9):1406-1428. doi: 10.1109/JPROC.2003.817141
Cosatto, Eric ; Ostermann, Jörn ; Graf, Hans Peter et al. / Lifelike talking faces for interactive services. in: Proceedings of the IEEE. 2003 ; Jahrgang 91, Nr. 9. S. 1406-1428.
Download
@article{e17ddb27301f41f0af1c7972d7cf5585,
title = "Lifelike talking faces for interactive services",
abstract = "Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such {"}visual prosody{"} is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.",
keywords = "Avatar, Computer graphics, Face animation, MPEG-4, Sample-based graphics, Speech synthesizer, Text-to-speech (TTS), Video-based rendering, Visual text-to-speech (VTTS)",
author = "Eric Cosatto and J{\"o}rn Ostermann and Graf, {Hans Peter} and Juergen Schroeter",
year = "2003",
month = sep,
doi = "10.1109/JPROC.2003.817141",
language = "English",
volume = "91",
pages = "1406--1428",
journal = "Proceedings of the IEEE",
issn = "0018-9219",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "9",

}

Download

TY - JOUR

T1 - Lifelike talking faces for interactive services

AU - Cosatto, Eric

AU - Ostermann, Jörn

AU - Graf, Hans Peter

AU - Schroeter, Juergen

PY - 2003/9

Y1 - 2003/9

N2 - Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

AB - Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

KW - Avatar

KW - Computer graphics

KW - Face animation

KW - MPEG-4

KW - Sample-based graphics

KW - Speech synthesizer

KW - Text-to-speech (TTS)

KW - Video-based rendering

KW - Visual text-to-speech (VTTS)

UR - http://www.scopus.com/inward/record.url?scp=10044281988&partnerID=8YFLogxK

U2 - 10.1109/JPROC.2003.817141

DO - 10.1109/JPROC.2003.817141

M3 - Article

AN - SCOPUS:10044281988

VL - 91

SP - 1406

EP - 1428

JO - Proceedings of the IEEE

JF - Proceedings of the IEEE

SN - 0018-9219

IS - 9

ER -

Von denselben Autoren