"Cool glasses, where did you get them?" Generating Visually Grounded Conversation Starters for Human-Robot Dialogue

被引:5
作者
Janssens, Ruben [1 ]
Wolfert, Pieter [1 ]
Demeester, Thomas [1 ]
Belpaeme, Tony [1 ]
机构
[1] Univ Ghent, IDLab, Imec, Ghent, Belgium
来源
PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22) | 2022年
关键词
Human-Robot Interaction; multi-modal dialogue; conversational agent; Natural Language Generation; Natural Language Processing; situatedness; grounding;
D O I
10.1109/HRI53351.2022.9889489
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.
引用
收藏
页码:821 / 825
页数:5
相关论文
共 25 条
[1]  
Al Moubayed Samer, 2012, Cognitive Behavioural Systems (COST 2012). International Training School. Revised Selected Papers, P114, DOI 10.1007/978-3-642-34584-5_9
[2]  
Bartneck C., 2020, Human-Robot Interaction: An Introduction
[3]   Robot education peers in a situated primary school study: Personalisation promotes child learning [J].
Baxter, Paul ;
Ashurst, Emily ;
Read, Robin ;
Kennedy, James ;
Belpaeme, Tony .
PLOS ONE, 2017, 12 (05)
[4]   Multimodal Child-Robot Interaction: Building Social Bonds [J].
Belpaeme, Tony ;
Baxter, Paul ;
Read, Robin ;
Wood, Rachel ;
Cuayahuitl, Heriberto ;
Kiefer, Bernd ;
Racioppa, Stefania ;
Kruijff-Korbayova, Ivana ;
Athanasopoulos, Georgios ;
Enescu, Valentin ;
Looije, Rosemarijn ;
Neerincx, Mark ;
Demiris, Yiannis ;
Ros-Espinoza, Raquel ;
Beck, Aryel ;
Carinamero, Lola ;
Hiolle, Antione ;
Lewis, Matthew ;
Baroni, Ilaria ;
Nalin, Marco ;
Cosi, Piero ;
Paci, Giulio ;
Tesser, Fabio ;
Sommavilla, Giacomo ;
Humbert, Remi .
JOURNAL OF HUMAN-ROBOT INTERACTION, 2012, 1 (02) :33-53
[5]  
Bickmore T. W., 2005, ACM Transactions on Computer-Human Interaction, V12, P293, DOI 10.1145/1067860.1067867
[6]  
CLARK HH, 1991, PERSPECTIVES ON SOCIALLY SHARED COGNITION, P127, DOI 10.1037/10096-006
[7]  
Endsley M.R., 2000, Situation Awareness Analysis and Measurement
[8]  
Jonell P., 2020, P 20 ACM INT C INTEL, P1
[9]   MEASUREMENT OF TIME SPENT COMMUNICATING [J].
KLEMMER, ET ;
SNYDER, FW .
JOURNAL OF COMMUNICATION, 1972, 22 (02) :142-158
[10]   Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].
Krishna, Ranjay ;
Zhu, Yuke ;
Groth, Oliver ;
Johnson, Justin ;
Hata, Kenji ;
Kravitz, Joshua ;
Chen, Stephanie ;
Kalantidis, Yannis ;
Li, Li-Jia ;
Shamma, David A. ;
Bernstein, Michael S. ;
Li Fei-Fei .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73