Multimodal Speech Synthesis

J. Schroeter; J. Ostermann; H. P. Graf; M. Beutnagel; E. Cosatto; A. Syrdal; A. Conkie; Y. Stylianou

Details

Originalsprache	Englisch
Seiten	571-574
Seitenumfang	4
Publikationsstatus	Veröffentlicht - 2000
Extern publiziert	Ja
Veranstaltung	2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) - New York, NY, USA / Vereinigte Staaten Dauer: 30 Juli 2000 → 2 Aug. 2000

Konferenz

Konferenz	2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)
Land/Gebiet	USA / Vereinigte Staaten
Ort	New York, NY
Zeitraum	30 Juli 2000 → 2 Aug. 2000

Abstract

Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

ASJC Scopus Sachgebiete

Ingenieurwesen (insg.)
Allgemeiner Maschinenbau

Zitieren

Multimodal Speech Synthesis. / Schroeter, J.; Ostermann, J.; Graf, H. P. et al.
2000. 571-574 Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.

Publikation: Konferenzbeitrag › Paper › Forschung › Peer-Review

Schroeter, J, Ostermann, J, Graf, HP, Beutnagel, M, Cosatto, E, Syrdal, A, Conkie, A & Stylianou, Y 2000, 'Multimodal Speech Synthesis', Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten, 30 Juli 2000 - 2 Aug. 2000 S. 571-574.

Schroeter, J., Ostermann, J., Graf, H. P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., & Stylianou, Y. (2000). Multimodal Speech Synthesis. 571-574. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.

Schroeter J, Ostermann J, Graf HP, Beutnagel M, Cosatto E, Syrdal A et al.. Multimodal Speech Synthesis. 2000. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.

Schroeter, J. ; Ostermann, J. ; Graf, H. P. et al. / Multimodal Speech Synthesis. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.4 S.

Download

@conference{b71c03dea6dc49c38cdbcb2da079136d,

title = "Multimodal Speech Synthesis",

abstract = "Multimodal Speech Synthesis ({"}Talking Heads{"}) encompasses synthesis of speech from text ({"}Text-to-Speech{"}, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ({"}Visual TTS{"}, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.",

author = "J. Schroeter and J. Ostermann and Graf, {H. P.} and M. Beutnagel and E. Cosatto and A. Syrdal and A. Conkie and Y. Stylianou",

year = "2000",

language = "English",

pages = "571--574",

note = "2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) ; Conference date: 30-07-2000 Through 02-08-2000",

}

Download

TY - CONF

T1 - Multimodal Speech Synthesis

AU - Schroeter, J.

AU - Ostermann, J.

AU - Graf, H. P.

AU - Beutnagel, M.

AU - Cosatto, E.

AU - Syrdal, A.

AU - Conkie, A.

AU - Stylianou, Y.

PY - 2000

Y1 - 2000

N2 - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

AB - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

UR - http://www.scopus.com/inward/record.url?scp=0034509487&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034509487

SP - 571

EP - 574

T2 - 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)

Y2 - 30 July 2000 through 2 August 2000

ER -

Research@Leibniz University

Multimodal Speech Synthesis

Autoren

Externe Organisationen

Details

Konferenz

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

Matched Filter for Acoustic Emission Monitoring in Noisy Environments: Application to Wire Break Detection