GATED MULTIMODAL FUSION WITH CONTRASTIVE LEARNING FOR TURN-TAKING PREDICTION IN HUMAN-ROBOT DIALOGUE

被引:6
|
作者
Yang, Jiudong [1 ]
Wang, Peiying [1 ]
Zhu, Yi [1 ,2 ]
Feng, Mingchao [1 ]
Chen, Meng [1 ]
He, Xiaodong [1 ]
机构
[1] JD AI, Beijing, Peoples R China
[2] Univ Cambridge, LTL, Cambridge, England
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
国家重点研发计划;
关键词
Multimodal Fusion; Turn-taking; Barge-in; Endpointing; Spoken Dialogue System;
D O I
10.1109/ICASSP43922.2022.9747056
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems. Previous studies indicate that multimodal cues can facilitate this challenging task. However, due to the paucity of public multimodal datasets, current methods are mostly limited to either utilizing unimodal features or simplistic multimodal ensemble models. Besides, the inherent class imbalance in real scenario, e.g. sentence ending with short pause will be mostly regarded as the end of turn, also poses great challenge to the turn-taking decision. In this paper, we first collect a large-scale annotated corpus for turn-taking with over 5,000 real human-robot dialogues in speech and text modalities. Then, a novel gated multimodal fusion mechanism is devised to utilize various information seamlessly for turn-taking prediction. More importantly, to tackle the data imbalance issue, we design a simple yet effective data augmentation method to construct negative instances without supervision and apply contrastive learning to obtain better feature representations. Extensive experiments are conducted and the results demonstrate the superiority and competitiveness of our model over several state-of-the-art baselines.
引用
收藏
页码:7747 / 7751
页数:5
相关论文
共 23 条
  • [1] Timing Multimodal Turn-Taking for Human-Robot Cooperation
    Chao, Crystal
    ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 309 - 312
  • [2] Turn-Taking Prediction for Human-Robot Collaborative Assembly Considering Human Uncertainty
    Xu, Wenjun
    Feng, Siqi
    Yao, Bitao
    Ji, Zhenrui
    Liu, Zhihao
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2023, 145 (12):
  • [3] Turn-Taking Intention Recognition using Multimodal Cues in Social Human-Robot Interaction
    Park, Cheonshu
    Kim, Jaehong
    Kang, Ji-Hoon
    2017 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2017, : 1300 - 1302
  • [4] Turn-taking in Conversational Systems and Human-Robot Interaction: A Review
    Skantze, Gabriel
    COMPUTER SPEECH AND LANGUAGE, 2021, 67
  • [5] Timed Petri nets for fluent turn-taking over multimodal interaction resources in human-robot collaboration
    Chao, Crystal
    Thomaz, Andrea
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (11) : 1371 - 1394
  • [6] Turn-taking, feedback and joint attention in situated human-robot interaction
    Skantze, Gabriel
    Hjalmarsson, Anna
    Oertel, Catharine
    SPEECH COMMUNICATION, 2014, 65 : 50 - 66
  • [7] Analysis of Timing and Effect of Visual Cue on Turn-Taking in Human-Robot Interaction
    Obo, Takenori
    Takizawa, Kazuma
    JOURNAL OF ROBOTICS AND MECHATRONICS, 2022, 34 (01) : 55 - 63
  • [8] Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
    Roddy, Matthew
    Skantze, Gabriel
    Harte, Naomi
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 186 - 190
  • [9] Multimodal Turn-Taking Model Using Visual Cues for End-of-Utterance Prediction in Spoken Dialogue Systems
    Kurata, Fuma
    Saeki, Mao
    Fujie, Shinya
    Matsuyama, Yoichi
    INTERSPEECH 2023, 2023, : 2658 - 2662
  • [10] Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
    Hara, Kohei
    Inoue, Koji
    Takanashi, Katsuya
    Kawahara, Tatsuya
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 991 - 995