A Survey of Multimodal Perception Methods for Human-Robot Interaction in Social Environments

被引：1

作者：

Duncan, John A. ^{[1
]}

Alambeigi, Farshid ^{[1
]}

Pryor, Mitchell W. ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION | 2024年 / 13卷 / 04期

关键词：

Human-robot interaction; multimodal perception; situated interaction; social robotics; human social environments; USER ENGAGEMENT; SOUND SOURCES; LOCALIZATION; RECOGNITION; DESIGN; FUSION; SYSTEM; FRAMEWORK; NETWORK; DATASET;

D O I：

10.1145/3657030

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Human-robot interaction (HRI) in human social environments (HSEs) poses unique challenges for robot perception systems, which must combine asynchronous, heterogeneous data streams in real time. Multimodal perception systems are well-suited for HRI in HSEs and can provide more rich, robust interaction for robots operating among humans. In this article, we provide an overview of multimodal perception systems being used in HSEs, which is intended to be an introduction to the topic and summary of relevant trends, techniques, resources, challenges, and terminology. We surveyed 15 peer-reviewed robotics and HRI publications over the past 10+ years, providing details about the data acquisition, processing, and fusion techniques used in 65 multimodal perception systems across various HRI domains. Our survey provides information about hardware, software, datasets, and methods currently available for HRI perception research, as well as how these perception systems are being applied in HSEs. Based on the survey, we summarize trends, challenges, and limitations of multimodal human perception systems for robots, then identify resources for researchers and developers and propose future research areas to advance the field.

引用

页数：50

共 188 条

[1] Abioye Ayodeji O., 2018, Towards Autonomous Robotic Systems. 19th Annual Conference, TAROS 2018 Proceedings: Lecture Notes in Artificial Intelligence (LNAI 10965), P423, DOI 10.1007/978-3-319-96728-8_36
[2] The performance and cognitive workload analysis of a multimodal speech and visual gesture (mSVG) UAV control interface
Abioye, Ayodeji Opeyemi
Prior, Stephen D.
Saddington, Peter
Ramchurn, Sarvapali D.
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 147
[3] Al Moubayed Samer, 2012, Cognitive Behavioural Systems (COST 2012). International Training School. Revised Selected Papers, P114, DOI 10.1007/978-3-642-34584-5_9
[4] Spontaneous Spoken Dialogues with the Furhat Human-like Robot Head
Al Moubayed, Samer
Beskow, Jonas
Skantze, Gabriel
[J]. HRI'14: PROCEEDINGS OF THE 2014 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2014, : 326 - 326
[5] RAVEL: an annotated corpus for training robots with audiovisual abilities
Alameda-Pineda, Xavier
Sanchez-Riera, Jordi
Wienke, Johannes
Franc, Vojtech
Cech, Jan
Kulkarni, Kaustubh
Deleforge, Antoine
Horaud, Radu
[J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2013, 7 (1-2) : 79 - 91
[6] Andrist S, 2019, ACMIEEE INT CONF HUM, P668, DOI [10.1109/HRI.2019.8673067, 10.1109/hri.2019.8673067]
[7] Andrist Sean, 2020, P AAAI FALL S ART IN
[8] [Anonymous], 2009, P 2009 INT C MULTIMO, DOI DOI 10.1145/1647314.1647323
[9] Azagra P, 2017, IEEE INT C INT ROBOT, P6134, DOI 10.1109/IROS.2017.8206514
[10] Multimodal Machine Learning: A Survey and Taxonomy
Baltrusaitis, Tadas
Ahuja, Chaitanya
Morency, Louis-Philippe
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443

← 1 2 3 4 5 6 7 8 9 10 →