A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

被引：1

作者：

Yu, Jun ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2017), PT I | 2017年 / 10132卷

基金：

中国国家自然科学基金;

关键词：

Articulatory animation; Song-to-articulator mapping; SPEECH SYNTHESIS; MODEL; GENERATION;

D O I：

10.1007/978-3-319-51811-4_5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A facial animation system is proposed for visual singing synthesis. With a reconstructed 3D head mesh model, both finite element method and anatomical model are used to simulate articulatory deformation corresponding to each phoneme with musical note. Based on an articulatory song corpus, articulatory movements, phonemes and musical notes are trained simultaneously to obtain the visual co-articulation model by a context-dependent Hidden Markov Model. Articulatory animations corresponding to all phonemes are concatenated by visual co-articulation model to produce the song synchronized articulatory animation. Experimental results demonstrate the system can synthesize realistic song synchronized articulatory animation for increasing the human computer interaction capability objectively and subjectively.

引用

页码：53 / 64

页数：12

共 35 条

[1] Anderson R., 2013, CVPR, P146
[2] [Anonymous], P CVPR
[3] [Anonymous], 2007, INFORM RETRIEVAL MUS
[4] Badin P, 2008, LECT NOTES COMPUT SC, V5098, P132, DOI 10.1007/978-3-540-70517-8_14
[5] Ben Youssef A, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P2235
[6] Expressive Speech Animation Synthesis with Phoneme-Level Controls
Deng, Z.
Neumann, U.
[J]. COMPUTER GRAPHICS FORUM, 2008, 27 (08) : 2096 - 2113
[7] Ekman P., 1978, APA PsycTests, DOI DOI 10.1037/T27734-000
[8] An articulation model for audiovisual speech synthesis - Determination, adjustment, evaluation
Fagel, S
Clemens, C
[J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 141 - 154
[9] Hartholt Arno, 2013, Intelligent Virtual Agents. 13th International Conference, IVA 2013. Proceedings: LNCS 8108, P368, DOI 10.1007/978-3-642-40415-3_33
[10] Automatic 3-d face model adaptation for model-based coding of videophone sequences
Kampmann, M
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (03) : 172 - 182

← 1 2 3 4 →