Comparing Cascaded LSTM Architectures for Generating Head Motion from Speech in Task-Oriented Dialogs

被引:2
|
作者
Nguyen, Duc-Canh [1 ,2 ]
Bailly, Gerard [1 ,2 ]
Elisei, Frederic [1 ,2 ]
机构
[1] Grenoble Alpes Univ, GIPSA Lab, Grenoble, France
[2] CNRS, Grenoble, France
关键词
Head motion generation; Human interactions; Multi-tasks learning; LSTM; Human-robot interaction; MODELS;
D O I
10.1007/978-3-319-91250-9_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [ 19,20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.
引用
收藏
页码:164 / 175
页数:12
相关论文
共 1 条
  • [1] Generating Synthetic Dialogues from Prompts to Improve Task-Oriented Dialogue Systems
    Steindl, Sebastian
    Schaefer, Ulrich
    Ludwig, Bernd
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2023, 2023, 14236 : 207 - 214