Visual speech synthesis from 3D mesh sequences driven by combined speech features

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publication2017 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationICME 2017
PublisherIEEE Computer Society
Pages1075-1080
Number of pages6
ISBN (electronic)9781509060672
Publication statusPublished - 28 Aug 2017
Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
Duration: 10 Jul 201714 Jul 2017

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (electronic)1945-788X

Abstract

Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

Keywords

    Facial Animation, Lip Synchronization, Speech Features, Visual Speech Synthesis

ASJC Scopus subject areas

Cite this

Visual speech synthesis from 3D mesh sequences driven by combined speech features. / Kuhnke, Felix; Ostermann, Jörn.
2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. p. 1075-1080 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Kuhnke, F & Ostermann, J 2017, Visual speech synthesis from 3D mesh sequences driven by combined speech features. in 2017 IEEE International Conference on Multimedia and Expo: ICME 2017., 8019546, Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, pp. 1075-1080, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hong Kong, 10 Jul 2017. https://doi.org/10.1109/icme.2017.8019546
Kuhnke, F., & Ostermann, J. (2017). Visual speech synthesis from 3D mesh sequences driven by combined speech features. In 2017 IEEE International Conference on Multimedia and Expo: ICME 2017 (pp. 1075-1080). Article 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo). IEEE Computer Society. https://doi.org/10.1109/icme.2017.8019546
Kuhnke F, Ostermann J. Visual speech synthesis from 3D mesh sequences driven by combined speech features. In 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society. 2017. p. 1075-1080. 8019546. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.1109/icme.2017.8019546
Kuhnke, Felix ; Ostermann, Jörn. / Visual speech synthesis from 3D mesh sequences driven by combined speech features. 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. pp. 1075-1080 (Proceedings - IEEE International Conference on Multimedia and Expo).
Download
@inproceedings{07e3e28e69eb4517b9945274201e4c6b,
title = "Visual speech synthesis from 3D mesh sequences driven by combined speech features",
abstract = "Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.",
keywords = "Facial Animation, Lip Synchronization, Speech Features, Visual Speech Synthesis",
author = "Felix Kuhnke and J{\"o}rn Ostermann",
year = "2017",
month = aug,
day = "28",
doi = "10.1109/icme.2017.8019546",
language = "English",
series = "Proceedings - IEEE International Conference on Multimedia and Expo",
publisher = "IEEE Computer Society",
pages = "1075--1080",
booktitle = "2017 IEEE International Conference on Multimedia and Expo",
address = "United States",
note = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017 ; Conference date: 10-07-2017 Through 14-07-2017",

}

Download

TY - GEN

T1 - Visual speech synthesis from 3D mesh sequences driven by combined speech features

AU - Kuhnke, Felix

AU - Ostermann, Jörn

PY - 2017/8/28

Y1 - 2017/8/28

N2 - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

AB - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

KW - Facial Animation

KW - Lip Synchronization

KW - Speech Features

KW - Visual Speech Synthesis

UR - http://www.scopus.com/inward/record.url?scp=85030238866&partnerID=8YFLogxK

U2 - 10.1109/icme.2017.8019546

DO - 10.1109/icme.2017.8019546

M3 - Conference contribution

AN - SCOPUS:85030238866

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 1075

EP - 1080

BT - 2017 IEEE International Conference on Multimedia and Expo

PB - IEEE Computer Society

T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

Y2 - 10 July 2017 through 14 July 2017

ER -

By the same author(s)