Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data

被引:17
作者
Ma, JY [1 ]
Cole, R [1 ]
Pellom, B [1 ]
Ward, W [1 ]
Wise, B [1 ]
机构
[1] Univ Colorado, Ctr Spoken Language Res, Boulder, CO 80309 USA
关键词
visible speech; visual speech synthesis; animated speech; coarticulation modelling; speech animation; face animation;
D O I
10.1002/cav.11
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a technique for accurate automatic visible speech synthesis from textual input. When provided with a speech waveform and the text of a spoken sentence, the system produces accurate visible speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 31) face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key-shape mapping function learned by a set of viseme examples in the source and target faces. A well-posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated visible speech are also developed to provide the final concatenative motion sequence. Copyright (C) 2004 John Wiley Sons, Ltd.
引用
收藏
页码:485 / 500
页数:16
相关论文
共 37 条
  • [1] [Anonymous], 2003, PROCEEDINGS OF IEEE
  • [2] Bai Z., 2000, TEMPLATES SOLUTION A, DOI DOI 10.1137/1.9780898719581
  • [3] Bellman R., 1957, DYNAMIC PROGRAMMING
  • [4] A morphable model for the synthesis of 3D faces
    Blanz, V
    Vetter, T
    [J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
  • [5] Reanimating faces in images and video
    Blanz, V
    Basso, C
    Poggio, T
    Vetter, T
    [J]. COMPUTER GRAPHICS FORUM, 2003, 22 (03) : 641 - 650
  • [6] Bregler C, 2002, ACM T GRAPHIC, V21, P399, DOI 10.1145/566570.566595
  • [7] Bregler C, 1997, P 24 ANN C COMP GRAP, V97, P353, DOI DOI 10.1145/258734.258880
  • [8] Facial expression space learning
    Chuang, ES
    Deshpande, H
    Bregler, C
    [J]. 10TH PACIFIC CONFERENCE ON COMPUTER GRAPHICS AND APPLICATIONS, PROCEEDINGS, 2002, : 68 - 76
  • [9] Cohen M. M., 1993, Models and Techniques in Computer Animation, P139
  • [10] Perceptive animated interfaces: First steps toward a new paradigm for human-computer interaction
    Cole, R
    Van Vuuren, S
    Pellom, B
    Hacioglu, K
    Ma, JY
    Movellan, J
    Schwartz, S
    Wade-Stein, D
    Ward, W
    Yan, J
    [J]. PROCEEDINGS OF THE IEEE, 2003, 91 (09) : 1391 - 1405