"Cool glasses, where did you get them?" Generating Visually Grounded Conversation Starters for Human-Robot Dialogue

被引：5

作者：

Janssens, Ruben ^{[1
]}

Wolfert, Pieter ^{[1
]}

Demeester, Thomas ^{[1
]}

Belpaeme, Tony ^{[1
]}

机构：

[1] Univ Ghent, IDLab, Imec, Ghent, Belgium

来源：

PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22) | 2022年

关键词：

Human-Robot Interaction; multi-modal dialogue; conversational agent; Natural Language Generation; Natural Language Processing; situatedness; grounding;

D O I：

10.1109/HRI53351.2022.9889489

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.

引用

页码：821 / 825

页数：5

共 25 条

[1]

Al Moubayed Samer, 2012, Cognitive Behavioural Systems (COST 2012). International Training School. Revised Selected Papers, P114, DOI 10.1007/978-3-642-34584-5_9

[2]

Bartneck C., 2020, Human-Robot Interaction: An Introduction

[3] Robot education peers in a situated primary school study: Personalisation promotes child learning [J].

Baxter, Paul ;

Ashurst, Emily ;

Read, Robin ;

Kennedy, James ;

Belpaeme, Tony .

PLOS ONE, 2017, 12 (05)

[4] Multimodal Child-Robot Interaction: Building Social Bonds [J].

Belpaeme, Tony ;

Baxter, Paul ;

Read, Robin ;

Wood, Rachel ;

Cuayahuitl, Heriberto ;

Kiefer, Bernd ;

Racioppa, Stefania ;

Kruijff-Korbayova, Ivana ;

Athanasopoulos, Georgios ;

Enescu, Valentin ;

Looije, Rosemarijn ;

Neerincx, Mark ;

Demiris, Yiannis ;

Ros-Espinoza, Raquel ;

Beck, Aryel ;

Carinamero, Lola ;

Hiolle, Antione ;

Lewis, Matthew ;

Baroni, Ilaria ;

Nalin, Marco ;

Cosi, Piero ;

Paci, Giulio ;

Tesser, Fabio ;

Sommavilla, Giacomo ;

Humbert, Remi .

JOURNAL OF HUMAN-ROBOT INTERACTION, 2012, 1 (02) :33-53

[5]

Bickmore T. W., 2005, ACM Transactions on Computer-Human Interaction, V12, P293, DOI 10.1145/1067860.1067867

[6]

CLARK HH, 1991, PERSPECTIVES ON SOCIALLY SHARED COGNITION, P127, DOI 10.1037/10096-006

[7]

Endsley M.R., 2000, Situation Awareness Analysis and Measurement

[8]

Jonell P., 2020, P 20 ACM INT C INTEL, P1

[9] MEASUREMENT OF TIME SPENT COMMUNICATING [J].

KLEMMER, ET ;

SNYDER, FW .

JOURNAL OF COMMUNICATION, 1972, 22 (02) :142-158

[10] Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [J].

Krishna, Ranjay ;

Zhu, Yuke ;

Groth, Oliver ;

Johnson, Justin ;

Hata, Kenji ;

Kravitz, Joshua ;

Chen, Stephanie ;

Kalantidis, Yannis ;

Li, Li-Jia ;

Shamma, David A. ;

Bernstein, Michael S. ;

Li Fei-Fei .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :32-73

← 1 2 3 →