Multimodal Speech Synthesis

Publikation: KonferenzbeitragPaperForschungPeer-Review

Autoren

  • J. Schroeter
  • J. Ostermann
  • H. P. Graf
  • M. Beutnagel
  • E. Cosatto
  • A. Syrdal
  • A. Conkie
  • Y. Stylianou

Externe Organisationen

  • AT&T Labs
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten571-574
Seitenumfang4
PublikationsstatusVeröffentlicht - 2000
Extern publiziertJa
Veranstaltung2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) - New York, NY, USA / Vereinigte Staaten
Dauer: 30 Juli 20002 Aug. 2000

Konferenz

Konferenz2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)
Land/GebietUSA / Vereinigte Staaten
OrtNew York, NY
Zeitraum30 Juli 20002 Aug. 2000

Abstract

Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

ASJC Scopus Sachgebiete

Zitieren

Multimodal Speech Synthesis. / Schroeter, J.; Ostermann, J.; Graf, H. P. et al.
2000. 571-574 Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.

Publikation: KonferenzbeitragPaperForschungPeer-Review

Schroeter, J, Ostermann, J, Graf, HP, Beutnagel, M, Cosatto, E, Syrdal, A, Conkie, A & Stylianou, Y 2000, 'Multimodal Speech Synthesis', Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten, 30 Juli 2000 - 2 Aug. 2000 S. 571-574.
Schroeter, J., Ostermann, J., Graf, H. P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., & Stylianou, Y. (2000). Multimodal Speech Synthesis. 571-574. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.
Schroeter J, Ostermann J, Graf HP, Beutnagel M, Cosatto E, Syrdal A et al.. Multimodal Speech Synthesis. 2000. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.
Schroeter, J. ; Ostermann, J. ; Graf, H. P. et al. / Multimodal Speech Synthesis. Beitrag in 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, USA / Vereinigte Staaten.4 S.
Download
@conference{b71c03dea6dc49c38cdbcb2da079136d,
title = "Multimodal Speech Synthesis",
abstract = "Multimodal Speech Synthesis ({"}Talking Heads{"}) encompasses synthesis of speech from text ({"}Text-to-Speech{"}, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ({"}Visual TTS{"}, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.",
author = "J. Schroeter and J. Ostermann and Graf, {H. P.} and M. Beutnagel and E. Cosatto and A. Syrdal and A. Conkie and Y. Stylianou",
year = "2000",
language = "English",
pages = "571--574",
note = "2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) ; Conference date: 30-07-2000 Through 02-08-2000",

}

Download

TY - CONF

T1 - Multimodal Speech Synthesis

AU - Schroeter, J.

AU - Ostermann, J.

AU - Graf, H. P.

AU - Beutnagel, M.

AU - Cosatto, E.

AU - Syrdal, A.

AU - Conkie, A.

AU - Stylianou, Y.

PY - 2000

Y1 - 2000

N2 - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

AB - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

UR - http://www.scopus.com/inward/record.url?scp=0034509487&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034509487

SP - 571

EP - 574

T2 - 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)

Y2 - 30 July 2000 through 2 August 2000

ER -

Von denselben Autoren