Lifelike talking faces for interactive services

Eric Cosatto; Jörn Ostermann; Hans Peter Graf; Juergen Schroeter

doi:10.1109/JPROC.2003.817141

Details

Originalsprache	Englisch
Seiten (von - bis)	1406-1428
Seitenumfang	23
Fachzeitschrift	Proceedings of the IEEE
Jahrgang	91
Ausgabenummer	9
Publikationsstatus	Veröffentlicht - Sept. 2003
Extern publiziert	Ja

Abstract

Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

ASJC Scopus Sachgebiete

Informatik (insg.)
Allgemeine Computerwissenschaft
Ingenieurwesen (insg.)
Elektrotechnik und Elektronik

Zitieren

Lifelike talking faces for interactive services. / Cosatto, Eric; Ostermann, Jörn; Graf, Hans Peter et al.
in: Proceedings of the IEEE, Jahrgang 91, Nr. 9, 09.2003, S. 1406-1428.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Cosatto, E, Ostermann, J, Graf, HP & Schroeter, J 2003, 'Lifelike talking faces for interactive services', Proceedings of the IEEE, Jg. 91, Nr. 9, S. 1406-1428. https://doi.org/10.1109/JPROC.2003.817141

Cosatto, E., Ostermann, J., Graf, H. P., & Schroeter, J. (2003). Lifelike talking faces for interactive services. Proceedings of the IEEE, 91(9), 1406-1428. https://doi.org/10.1109/JPROC.2003.817141

Cosatto E, Ostermann J, Graf HP, Schroeter J. Lifelike talking faces for interactive services. Proceedings of the IEEE. 2003 Sep;91(9):1406-1428. doi: 10.1109/JPROC.2003.817141

Cosatto, Eric ; Ostermann, Jörn ; Graf, Hans Peter et al. / Lifelike talking faces for interactive services. in: Proceedings of the IEEE. 2003 ; Jahrgang 91, Nr. 9. S. 1406-1428.

Download

@article{e17ddb27301f41f0af1c7972d7cf5585,

title = "Lifelike talking faces for interactive services",

abstract = "Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such {"}visual prosody{"} is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.",

keywords = "Avatar, Computer graphics, Face animation, MPEG-4, Sample-based graphics, Speech synthesizer, Text-to-speech (TTS), Video-based rendering, Visual text-to-speech (VTTS)",

author = "Eric Cosatto and J{\"o}rn Ostermann and Graf, {Hans Peter} and Juergen Schroeter",

year = "2003",

month = sep,

doi = "10.1109/JPROC.2003.817141",

language = "English",

volume = "91",

pages = "1406--1428",

journal = "Proceedings of the IEEE",

issn = "0018-9219",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

Download

TY - JOUR

T1 - Lifelike talking faces for interactive services

AU - Cosatto, Eric

AU - Ostermann, Jörn

AU - Graf, Hans Peter

AU - Schroeter, Juergen

PY - 2003/9

Y1 - 2003/9

N2 - Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

AB - Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTF-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.

KW - Avatar

KW - Computer graphics

KW - Face animation

KW - MPEG-4

KW - Sample-based graphics

KW - Speech synthesizer

KW - Text-to-speech (TTS)

KW - Video-based rendering

KW - Visual text-to-speech (VTTS)

UR - http://www.scopus.com/inward/record.url?scp=10044281988&partnerID=8YFLogxK

U2 - 10.1109/JPROC.2003.817141

DO - 10.1109/JPROC.2003.817141

M3 - Article

AN - SCOPUS:10044281988

VL - 91

SP - 1406

EP - 1428

JO - Proceedings of the IEEE

JF - Proceedings of the IEEE

SN - 0018-9219

IS - 9

ER -

Research@Leibniz University

Lifelike talking faces for interactive services

Autoren

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

Matched Filter for Acoustic Emission Monitoring in Noisy Environments: Application to Wire Break Detection