Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationComputer Analysis of Images and Patterns
Subtitle of host publication13th International Conference, CAIP 2009, Proceedings
Pages1212-1219
Number of pages8
ISBN (electronic)978-3-642-03767-2
Publication statusPublished - 2009
Event13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 - Munster, Germany
Duration: 2 Sept 20094 Sept 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5702 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

ASJC Scopus subject areas

Cite this

Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. / Liu, Kang; Ostermann, Joern.
Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. p. 1212-1219 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5702 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Liu, K & Ostermann, J 2009, Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. in Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5702 LNCS, pp. 1212-1219, 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009, Munster, Germany, 2 Sept 2009. https://doi.org/10.1007/978-3-642-03767-2_147
Liu, K., & Ostermann, J. (2009). Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. In Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings (pp. 1212-1219). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5702 LNCS). https://doi.org/10.1007/978-3-642-03767-2_147
Liu K, Ostermann J. Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. In Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. p. 1212-1219. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-03767-2_147
Liu, Kang ; Ostermann, Joern. / Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. pp. 1212-1219 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{344638634b504edb8e3e535394fc6d9f,
title = "Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness",
abstract = "Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.",
author = "Kang Liu and Joern Ostermann",
year = "2009",
doi = "10.1007/978-3-642-03767-2_147",
language = "English",
isbn = "3642037666",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "1212--1219",
booktitle = "Computer Analysis of Images and Patterns",
note = "13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 ; Conference date: 02-09-2009 Through 04-09-2009",

}

Download

TY - GEN

T1 - Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

AU - Liu, Kang

AU - Ostermann, Joern

PY - 2009

Y1 - 2009

N2 - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

AB - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

UR - http://www.scopus.com/inward/record.url?scp=70349311713&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03767-2_147

DO - 10.1007/978-3-642-03767-2_147

M3 - Conference contribution

AN - SCOPUS:70349311713

SN - 3642037666

SN - 9783642037665

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1212

EP - 1219

BT - Computer Analysis of Images and Patterns

T2 - 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009

Y2 - 2 September 2009 through 4 September 2009

ER -

By the same author(s)