Multimodal Speech Synthesis

Research output: Contribution to conferencePaperResearchpeer review

Authors

  • J. Schroeter
  • J. Ostermann
  • H. P. Graf
  • M. Beutnagel
  • E. Cosatto
  • A. Syrdal
  • A. Conkie
  • Y. Stylianou

External Research Organisations

  • AT&T Labs
View graph of relations

Details

Original languageEnglish
Pages571-574
Number of pages4
Publication statusPublished - 2000
Externally publishedYes
Event2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) - New York, NY, United States
Duration: 30 Jul 20002 Aug 2000

Conference

Conference2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)
Country/TerritoryUnited States
CityNew York, NY
Period30 Jul 20002 Aug 2000

Abstract

Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

ASJC Scopus subject areas

Cite this

Multimodal Speech Synthesis. / Schroeter, J.; Ostermann, J.; Graf, H. P. et al.
2000. 571-574 Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.

Research output: Contribution to conferencePaperResearchpeer review

Schroeter, J, Ostermann, J, Graf, HP, Beutnagel, M, Cosatto, E, Syrdal, A, Conkie, A & Stylianou, Y 2000, 'Multimodal Speech Synthesis', Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States, 30 Jul 2000 - 2 Aug 2000 pp. 571-574.
Schroeter, J., Ostermann, J., Graf, H. P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., & Stylianou, Y. (2000). Multimodal Speech Synthesis. 571-574. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.
Schroeter J, Ostermann J, Graf HP, Beutnagel M, Cosatto E, Syrdal A et al.. Multimodal Speech Synthesis. 2000. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.
Schroeter, J. ; Ostermann, J. ; Graf, H. P. et al. / Multimodal Speech Synthesis. Paper presented at 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.4 p.
Download
@conference{b71c03dea6dc49c38cdbcb2da079136d,
title = "Multimodal Speech Synthesis",
abstract = "Multimodal Speech Synthesis ({"}Talking Heads{"}) encompasses synthesis of speech from text ({"}Text-to-Speech{"}, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ({"}Visual TTS{"}, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.",
author = "J. Schroeter and J. Ostermann and Graf, {H. P.} and M. Beutnagel and E. Cosatto and A. Syrdal and A. Conkie and Y. Stylianou",
year = "2000",
language = "English",
pages = "571--574",
note = "2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000) ; Conference date: 30-07-2000 Through 02-08-2000",

}

Download

TY - CONF

T1 - Multimodal Speech Synthesis

AU - Schroeter, J.

AU - Ostermann, J.

AU - Graf, H. P.

AU - Beutnagel, M.

AU - Cosatto, E.

AU - Syrdal, A.

AU - Conkie, A.

AU - Stylianou, Y.

PY - 2000

Y1 - 2000

N2 - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

AB - Multimodal Speech Synthesis ("Talking Heads") encompasses synthesis of speech from text ("Text-to-Speech", TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio ("Visual TTS", VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two moralities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of Multimodal Speech Synthesis in communications and e-commerce applications.

UR - http://www.scopus.com/inward/record.url?scp=0034509487&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034509487

SP - 571

EP - 574

T2 - 2000 IEEE Internatinal Conference on Multimedia and Expo (ICME 2000)

Y2 - 30 July 2000 through 2 August 2000

ER -

By the same author(s)