Details
Original language | English |
---|---|
Title of host publication | Computer Analysis of Images and Patterns |
Subtitle of host publication | 13th International Conference, CAIP 2009, Proceedings |
Pages | 1212-1219 |
Number of pages | 8 |
ISBN (electronic) | 978-3-642-03767-2 |
Publication status | Published - 2009 |
Event | 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 - Munster, Germany Duration: 2 Sept 2009 → 4 Sept 2009 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 5702 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Abstract
Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Computer Analysis of Images and Patterns: 13th International Conference, CAIP 2009, Proceedings. 2009. p. 1212-1219 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5702 LNCS).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness
AU - Liu, Kang
AU - Ostermann, Joern
PY - 2009
Y1 - 2009
N2 - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.
AB - Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.
UR - http://www.scopus.com/inward/record.url?scp=70349311713&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-03767-2_147
DO - 10.1007/978-3-642-03767-2_147
M3 - Conference contribution
AN - SCOPUS:70349311713
SN - 3642037666
SN - 9783642037665
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 1212
EP - 1219
BT - Computer Analysis of Images and Patterns
T2 - 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009
Y2 - 2 September 2009 through 4 September 2009
ER -