Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance

被引：11

作者：

Ferstl, Ylva ^{[1
]}

Neff, Michael ^{[2
]}

McDonnell, Rachel ^{[1
]}

机构：

[1] Trinity Coll Dublin, Dublin, Ireland

[2] Univ Calif Davis, Davis, CA 95616 USA

来源：

PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (ACM IVA 2020) | 2020年

基金：

爱尔兰科学基金会;

关键词：

speech gestures; machine learning; perception; gesture modelling;

D O I：

10.1145/3383652.3423882

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gesture behavior is a natural part of human conversation. Much work has focused on removing the need for tedious hand-animation to create embodied conversational agents by designing speech-driven gesture generators. However, these generators often work in a black-box manner, assuming a general relationship between input speech and output motion. As their success remains limited, we investigate in more detail how speech may relate to different aspects of gesture motion. We determine a number of parameters characterizing gesture, such as speed and gesture size, and explore their relationship to the speech signal in a two-fold manner. First, we train multiple recurrent networks to predict the gesture parameters from speech to understand how well gesture attributes can be modeled from speech alone. We find that gesture parameters can be partially predicted from speech, and some parameters, such as path length, being predicted more accurately than others, like velocity. Second, we design a perceptual study to assess the importance of each gesture parameter for producing motion that people perceive as appropriate for the speech. Results show that a degradation in any parameter was viewed negatively, but some changes, such as hand shape, are more impactful than others. A video summarization can be found at https://youtu.be/aw6-_5kmLjY.

引用

页数：8

共 36 条

[1] Style-Controllable Speech-Driven Gesture Synthesis Using Normalising FlowsKeywords [J].

Alexanderson, Simon ;

Henter, Gustav Eje ;

Kucherenko, Taras ;

Beskow, Jonas .

COMPUTER GRAPHICS FORUM, 2020, 39 (02) :487-496

[2]

[Anonymous], 2015, ORDINAL REGRESSION M

[3]

[Anonymous], 2013, P 12 ACM SIGGRAPH EU, DOI DOI 10.1145/2485895.2485900

[4]

[Anonymous], 2013, INT WORKSH INT VIRT

[5]

Bergmann K, 2009, LECT NOTES ARTIF INT, V5773, P76, DOI 10.1007/978-3-642-04380-2_12

[6] Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures [J].

Bozkurt, Elif ;

Yemez, Yucel ;

Erzin, Engin .

SPEECH COMMUNICATION, 2016, 85 :29-42

[7]

Cassell J, 2001, COMP GRAPH, P477, DOI 10.1145/383259.383315

[8]

Castillo G, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P702

[9]

Chiu CC, 2014, AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, P781

[10]

Chung-Cheng Chiu, 2011, Intelligent Virtual Agents. Proceedings 11th International Conference, IVA 2011, P127, DOI 10.1007/978-3-642-23974-8_14

← 1 2 3 4 →