Template-Warping Based Speech Driven Head Motion Synthesis

被引：0

作者：

Braude, David Adam ^{[1
]}

Shimodaira, Hiroshi ^{[1
]}

Ben Youssef, Atef ^{[1
]}

机构：

[1] Univ Edinburgh, Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

基金：

新加坡国家研究基金会;

关键词：

Head motion synthesis; GMMs; IOMM; ANIMATION; PROSODY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a method for synthesising head motion from speech using a combination of an Input-Output Markov model (IOMM) and Gaussian mixture models trained in a supervised manner. A key difference of this approach compared to others is to model the head motion in each angle as a series of templates of motion rather than trying to recover a frame-wise function. The templates were chosen to reflect natural patterns in the head motion, and states for the IOMM were chosen based on statistics of the templates. This reduces the search space for the trajectories and stops impossible motions such as discontinuities from being possible. For synthesis our system warps the templates to account for the acoustic features and the other angles' warping parameters. We show our system is capable of recovering the statistics of the motion that were chosen for the states. Our system was then compared to a baseline that used a frame-wise mapping that is based on previously published work. A subjective preference test that includes multiple speakers showed participants have a preference for the segment based approach. Both of these systems were trained on storytelling free speech.

引用

页码：2762 / 2766

页数：5

共 15 条

[1]

[Anonymous], INTERSPEECH

[2]

[Anonymous], 2006, MATRIX

[3]

Bengio Y., 1995, Advances in Neural Information Processing Systems 7, P427

[4] Natural head motion synthesis driven by acoustic prosodic features [J].

Busso, C ;

Deng, ZG ;

Neumann, U .

COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) :283-290

[5] Rigid head motion in expressive speech animation: Analysis and synthesis [J].

Busso, Carlos ;

Deng, Zhigang ;

Grimm, Michael ;

Neumann, Ulrich ;

Narayanan, Shrikanth .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1075-1086

[6]

Eyben F., 2010, P ACM MULT MM ACM

[7] Visual prosody: Facial movements accompanying speech [J].

Graf, HP ;

Cosatto, E ;

Strom, V ;

Huang, FJ .

FIFTH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, PROCEEDINGS, 2002, :396-401

[8] KINEMATICS OF HEAD MOVEMENTS ACCOMPANYING SPEECH DURING CONVERSATION [J].

HADAR, U ;

STEINER, TJ ;

GRANT, EC ;

ROSE, FC .

HUMAN MOVEMENT SCIENCE, 1983, 2 (1-2) :35-46

[9]

Hofer G. O., 2009, THESIS

[10]

Kuratate T., 1999, Eurospeech'99, V3, P1279

← 1 2 →