NVP-HRI: Zero shot natural voice and posture-based human-robot interaction via large language model

被引：1

作者：

Lai, Yuzhi ^{[1
]}

Yuan, Shenghai ^{[2
]}

Nassar, Youssef ^{[1
]}

Fan, Mingyu ^{[3
]}

Weber, Thomas ^{[1
]}

Raetsch, Matthias ^{[1
]}

机构：

[1] Univ Reutlingen, Alteburgstr 150, D-72762 Reutlingen, Germany

[2] Nanyang Technol Univ, 50 Nanyang Ave, Singapore 639798, Singapore

[3] Donghua Univ, 849 Zhongshan West St 9, Shanghai 200051, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 268卷

基金：

新加坡国家研究基金会;

关键词：

Human-robot interaction; Intent recognition; Multi-modality perception; Large language models; Unsupervised interaction; HUMAN INTENT; IDENTIFICATION; RECOGNITION; EXTRACTION; OBJECTS; GRAPH; PATH;

D O I：

10.1016/j.eswa.2024.126360

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased towards only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in recalling commands, memorizing hand gestures, and learning new names. This paper introduces NVP-HRI, an intuitive multi-modal HRI paradigm that combines voice commands and deictic posture. NVP-HRI utilizes the Segment Anything Model (SAM) to analyze visual cues and depth data, enabling precise structural object representation. Through a pre-trained SAM network, NVP-HRI allows interaction with new objects via zero-shot prediction, even without prior knowledge. NVP-HRI also integrates with a large language model (LLM) for multimodal commands, coordinating them with object selection and scene distribution in real time for collision-free trajectory solutions. We also regulate the action sequence with the essential control syntax to reduce LLM hallucination risks. The evaluation of diverse real-world tasks using a Universal Robot showcased up to 59.2% efficiency improvement over traditional gesture control, as illustrated in the video https://youtu.be/EbC7al2wiAc. Our code and design will be openly available at https://github.com/laiyuzhi/NVP-HRI.git.

引用

页数：14

共 94 条

[1] Identification and distance estimation of users and objects by means of electronic beacons in social robotics [J].

Alonso-Martin, Fernando ;

Castro-Gonzalez, Alvaro ;

Malfaz, Maria ;

Carlos Castillo, Jose ;

Salichs, Miguel A. .

EXPERT SYSTEMS WITH APPLICATIONS, 2017, 86 :247-257

[2]

Alpha Cephei, 2023, VOSK homepage.

[3] Learning surgical skills under the RCM constraint from demonstrations in robot-assisted minimally invasive surgery [J].

Bian, Gui-Bin ;

Chen, Zhang ;

Li, Zhen ;

Wei, Bing-Ting ;

Liu, Wei-Peng ;

da Silva, Daniel Santos ;

Wu, Wan-Qing ;

de Albuquerque, Victor Hugo C. .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225

[4] An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision [J].

Boykov, Y ;

Kolmogorov, V .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1124-1137

[5]

Brown T., 2020, ADV NEURAL INFORM PR, V33, P1877, DOI [10.48550/ARXIV.2005.14165, DOI 10.48550/ARXIV.2005.14165]

[6]

Cao H., 2024, P EUR C COMP VIS

[7]

Cao H., 2023, P IEEECVF INT C COMP, P18809

[8]

Cao H., 2024, 2024 IEEE INT C ROB

[9] DIRECT: A Differential Dynamic Programming Based Framework for Trajectory Generation [J].

Cao, Kun ;

Cao, Muqing ;

Yuan, Shenghai ;

Xie, Lihua .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) :2439-2446

[10] DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels [J].

Cao, Muqing ;

Xu, Xinhang ;

Yuan, Shenghai ;

Cao, Kun ;

Liu, Kangcheng ;

Xie, Lihua .

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, :6962-6969

← 1 2 3 4 5 6 7 8 9 10 →