Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

Kang Liu; Joern Ostermann

doi:10.1007/978-3-642-03767-2_147

Details

Original language	English
Title of host publication	Computer Analysis of Images and Patterns
Subtitle of host publication	13th International Conference, CAIP 2009, Proceedings
Pages	1212-1219
Number of pages	8
ISBN (electronic)	978-3-642-03767-2
Publication status	Published - 2009
Event	13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 - Munster, Germany Duration: 2 Sept 2009 → 4 Sept 2009

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	5702 LNCS
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

ASJC Scopus subject areas

Mathematics(all)
Theoretical Computer Science
Computer Science(all)
General Computer Science

Cite this

Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. / Liu, Kang; Ostermann, Joern.
Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. p. 1212-1219 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5702 LNCS).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Liu, K & Ostermann, J 2009, Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. in Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5702 LNCS, pp. 1212-1219, 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009, Munster, Germany, 2 Sept 2009. https://doi.org/10.1007/978-3-642-03767-2_147

Liu, K., & Ostermann, J. (2009). Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. In Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings (pp. 1212-1219). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5702 LNCS). https://doi.org/10.1007/978-3-642-03767-2_147

Liu K, Ostermann J. Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. In Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. p. 1212-1219. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-03767-2_147

Liu, Kang ; Ostermann, Joern. / Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness. Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. pp. 1212-1219 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{344638634b504edb8e3e535394fc6d9f,

title = "Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness",

abstract = "Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.",

author = "Kang Liu and Joern Ostermann",

year = "2009",

doi = "10.1007/978-3-642-03767-2_147",

language = "English",

isbn = "3642037666",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "1212--1219",

booktitle = "Computer Analysis of Images and Patterns",

note = "13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 ; Conference date: 02-09-2009 Through 04-09-2009",

}

Download

TY - GEN

T1 - Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

AU - Liu, Kang

AU - Ostermann, Joern

PY - 2009

Y1 - 2009

N2 - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

AB - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.

UR - http://www.scopus.com/inward/record.url?scp=70349311713&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03767-2_147

DO - 10.1007/978-3-642-03767-2_147

M3 - Conference contribution

AN - SCOPUS:70349311713

SN - 3642037666

SN - 9783642037665

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1212

EP - 1219

BT - Computer Analysis of Images and Patterns

T2 - 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009

Y2 - 2 September 2009 through 4 September 2009

ER -

Research@Leibniz University

Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

Authors

Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Acoustic Emission Detection in Noisy Environments using Linear Prediction

Genie: the first open-source ISO/IEC encoder for genomic data

On the Rate-Distortion-Complexity Trade-Offs of Neural Video Coding

Self-supervised domain adaptation for machinery remaining useful life prediction

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression