From audio-only to audio and video text-to-speech

Eric Cosatto; Hans Peter Graf; Jörn Ostermann; Juergen Schroeter

Details

Originalsprache	Englisch
Seiten (von - bis)	1084-1095
Seitenumfang	12
Fachzeitschrift	Acta Acustica united with Acustica
Jahrgang	90
Ausgabenummer	6
Publikationsstatus	Veröffentlicht - Nov. 2004

Abstract

Progress mae with the AT&T sample-based visual text-to-speech (VTTS) system is discussed. The VTTS system from AT&T incorporates unit selection synthesis and a moderate size recorded database of modified and concatenated video segments. It is suggested that several steps such as highly accurate image analysis tools for creating video clip databases, fast research techniques and rendering of composite face images on a graphic screen are very important to assure a high quality sample based VTTS system. It was found that accuracy and timeliness of lip closures and protrusions, turning points and overall smoothness are very critical for the system.

ASJC Scopus Sachgebiete

Geisteswissenschaftliche Fächer (insg.)
Musik
Physik und Astronomie (insg.)
Akustik und Ultraschall

Zitieren

From audio-only to audio and video text-to-speech. / Cosatto, Eric; Graf, Hans Peter; Ostermann, Jörn et al.
in: Acta Acustica united with Acustica, Jahrgang 90, Nr. 6, 11.2004, S. 1084-1095.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Cosatto, E, Graf, HP, Ostermann, J & Schroeter, J 2004, 'From audio-only to audio and video text-to-speech', Acta Acustica united with Acustica, Jg. 90, Nr. 6, S. 1084-1095.

Cosatto, E., Graf, H. P., Ostermann, J., & Schroeter, J. (2004). From audio-only to audio and video text-to-speech. Acta Acustica united with Acustica, 90(6), 1084-1095.

Cosatto E, Graf HP, Ostermann J, Schroeter J. From audio-only to audio and video text-to-speech. Acta Acustica united with Acustica. 2004 Nov;90(6):1084-1095.

Cosatto, Eric ; Graf, Hans Peter ; Ostermann, Jörn et al. / From audio-only to audio and video text-to-speech. in: Acta Acustica united with Acustica. 2004 ; Jahrgang 90, Nr. 6. S. 1084-1095.

Download

@article{d0508de0cb514692871afd5ed2ec9962,

title = "From audio-only to audio and video text-to-speech",

abstract = "Progress mae with the AT&T sample-based visual text-to-speech (VTTS) system is discussed. The VTTS system from AT&T incorporates unit selection synthesis and a moderate size recorded database of modified and concatenated video segments. It is suggested that several steps such as highly accurate image analysis tools for creating video clip databases, fast research techniques and rendering of composite face images on a graphic screen are very important to assure a high quality sample based VTTS system. It was found that accuracy and timeliness of lip closures and protrusions, turning points and overall smoothness are very critical for the system.",

author = "Eric Cosatto and Graf, {Hans Peter} and J{\"o}rn Ostermann and Juergen Schroeter",

year = "2004",

month = nov,

language = "English",

volume = "90",

pages = "1084--1095",

journal = "Acta Acustica united with Acustica",

issn = "1610-1928",

publisher = "S. Hirzel Verlag GmbH",

number = "6",

}

Download

TY - JOUR

T1 - From audio-only to audio and video text-to-speech

AU - Cosatto, Eric

AU - Graf, Hans Peter

AU - Ostermann, Jörn

AU - Schroeter, Juergen

PY - 2004/11

Y1 - 2004/11

N2 - Progress mae with the AT&T sample-based visual text-to-speech (VTTS) system is discussed. The VTTS system from AT&T incorporates unit selection synthesis and a moderate size recorded database of modified and concatenated video segments. It is suggested that several steps such as highly accurate image analysis tools for creating video clip databases, fast research techniques and rendering of composite face images on a graphic screen are very important to assure a high quality sample based VTTS system. It was found that accuracy and timeliness of lip closures and protrusions, turning points and overall smoothness are very critical for the system.

AB - Progress mae with the AT&T sample-based visual text-to-speech (VTTS) system is discussed. The VTTS system from AT&T incorporates unit selection synthesis and a moderate size recorded database of modified and concatenated video segments. It is suggested that several steps such as highly accurate image analysis tools for creating video clip databases, fast research techniques and rendering of composite face images on a graphic screen are very important to assure a high quality sample based VTTS system. It was found that accuracy and timeliness of lip closures and protrusions, turning points and overall smoothness are very critical for the system.

UR - http://www.scopus.com/inward/record.url?scp=11244348117&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:11244348117

VL - 90

SP - 1084

EP - 1095

JO - Acta Acustica united with Acustica

JF - Acta Acustica united with Acustica

SN - 1610-1928

IS - 6

ER -

Research@Leibniz University

From audio-only to audio and video text-to-speech

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data