Multimodal Speech Synthesis

J. Schroeter; J. Ostermann; H. P. Graf; M. Beutnagel; E. Cosatto; A. Syrdal; A. Conkie; Y. Stylianou

Details

Original language	English
Pages	571-574
Number of pages	4
Publication status	Published - 2000
Externally published	Yes
Event	2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) - New York, NY, United States Duration: 30 Jul 2000 → 2 Aug 2000

Conference

Conference	2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)
Country/Territory	United States
City	New York, NY
Period	30 Jul 2000 → 2 Aug 2000

Abstract

Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

ASJC Scopus subject areas

Engineering(all)
General Engineering

Cite this

Multimodal Speech Synthesis. / Schroeter, J.; Ostermann, J.; Graf, H. P. et al.
2000. 571-574 Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.

Research output: Contribution to conference › Paper › Research › peer review

Schroeter, J, Ostermann, J, Graf, HP, Beutnagel, M, Cosatto, E, Syrdal, A, Conkie, A & Stylianou, Y 2000, 'Multimodal Speech Synthesis', Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States, 30 Jul 2000 - 2 Aug 2000 pp. 571-574.

Schroeter, J., Ostermann, J., Graf, H. P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., & Stylianou, Y. (2000). Multimodal Speech Synthesis. 571-574. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.

Schroeter J, Ostermann J, Graf HP, Beutnagel M, Cosatto E, Syrdal A et al.. Multimodal Speech Synthesis. 2000. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.

Schroeter, J. ; Ostermann, J. ; Graf, H. P. et al. / Multimodal Speech Synthesis. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.4 p.

Download

@conference{b71c03dea6dc49c38cdbcb2da079136d,

title = "Multimodal Speech Synthesis",

abstract = "Multimodal Speech Synthesis ({"}Talking Heads{"}) encompasses synthesis of speech from text ({"}Text-to-Speech{"}, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ({"}Visual TTS{"}, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.",

author = "J. Schroeter and J. Ostermann and Graf, {H. P.} and M. Beutnagel and E. Cosatto and A. Syrdal and A. Conkie and Y. Stylianou",

year = "2000",

language = "English",

pages = "571--574",

note = "2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) ; Conference date: 30-07-2000 Through 02-08-2000",

}

Download

TY - CONF

T1 - Multimodal Speech Synthesis

AU - Schroeter, J.

AU - Ostermann, J.

AU - Graf, H. P.

AU - Beutnagel, M.

AU - Cosatto, E.

AU - Syrdal, A.

AU - Conkie, A.

AU - Stylianou, Y.

PY - 2000

Y1 - 2000

N2 - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

AB - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

UR - http://www.scopus.com/inward/record.url?scp=0034509487&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034509487

SP - 571

EP - 574

T2 - 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)

Y2 - 30 July 2000 through 2 August 2000

ER -

Research@Leibniz University

Multimodal Speech Synthesis

Authors

External Research Organisations

Details

Conference

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data