Comparing Cascaded LSTM Architectures for Generating Head Motion from Speech in Task-Oriented Dialogs

被引：2

作者：

Nguyen, Duc-Canh ^{[1
,2
]}

Bailly, Gerard ^{[1
,2
]}

Elisei, Frederic ^{[1
,2
]}

机构：

[1] Grenoble Alpes Univ, GIPSA Lab, Grenoble, France

[2] CNRS, Grenoble, France

来源：

HUMAN-COMPUTER INTERACTION: INTERACTION TECHNOLOGIES, HCI INTERNATIONAL 2018, PT III | 2018年 / 10903卷

关键词：

Head motion generation; Human interactions; Multi-tasks learning; LSTM; Human-robot interaction; MODELS;

D O I：

10.1007/978-3-319-91250-9_13

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [ 19,20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.

引用

页码：164 / 175

页数：12