Head and Eye Egocentric Gesture Recognition for Human-Robot Interaction Using Eyewear Cameras

被引:7
作者
Marina-Miranda, Javier [1 ,2 ]
Javier Traver, V [3 ]
机构
[1] Univ Jaume 1, E-12071 Castellon de La Plana, Spain
[2] HP Inc, Barcelona 08174, Spain
[3] Univ Jaume 1, Inst New Imaging Technol, E-12071 Castellon de La Plana, Spain
关键词
Gesture; posture and facial expressions; deep learning for visual perception;
D O I
10.1109/LRA.2022.3180442
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Non-verbal communication plays a particularly important role in a wide range of scenarios in Human-Robot Interaction (HRI). Accordingly, this work addresses the problem of human gesture recognition. In particular, we focus on head and eye gestures, and adopt an egocentric (first-person) perspective using eyewear cameras. We argue that this egocentric view may offer a number of conceptual and technical benefits over scene- or robot-centric perspectives. A motion-based recognition approach is proposed, which operates at two temporal granularities. Locally, frame-to-frame homographies are estimated with a convolutional neural network (CNN). The output of this CNN is input to a long short-term memory (LSTM) to capture longer-term temporal visual relationships, which are relevant to characterize gestures. Regarding the configuration of the network architecture, one particularly interesting finding is that using the output of aninternal layer of the homography CNN increases the recognition rate with respect to using the homography matrix itself. While this work focuses on action recognition, and no robot or user study has been conducted yet, the system has been designed to meet real-time constraints. The encouraging results suggest that the proposed egocentric perspective is viable, and this proofof-concept work provides novel and useful contributions to the exciting area of HRI.
引用
收藏
页码:7067 / 7074
页数:8
相关论文
共 46 条
[1]   Performer vs. Observer: Whose Comfort Level Should We Consider when Examining the Social Acceptability of Input Modalities for Head-Worn Display? [J].
Alallah, Fouad ;
Neshati, Ali ;
Sakamoto, Yumiko ;
Hasan, Khalad ;
Lank, Edward ;
Bunt, Andrea ;
Irani, Pourang .
24TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY (VRST 2018), 2018,
[2]   Machine Body Language: Expressing a Smart Speaker's Activity with Intelligible Physical Motion [J].
Avdic, Mirzel ;
Marquardt, Nicolai ;
Rogers, Yvonne ;
Vermeulen, Jo .
PROCEEDINGS OF THE 2021 ACM DESIGNING INTERACTIVE SYSTEMS CONFERENCE (DIS 2021), 2021, :1403-1418
[3]  
Bentz W, 2019, IEEE INT CONF ROBOT, P3003, DOI [10.1109/icra.2019.8793587, 10.1109/ICRA.2019.8793587]
[4]   Effects of nonverbal communication on efficiency and robustness in human-robot teamwork [J].
Breazeal, C ;
Kidd, CD ;
Thomaz, AL ;
Hoffman, G ;
Berlin, M .
2005 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2005, :383-388
[5]  
Breazeal C, 2016, SPRINGER HANDBOOK OF ROBOTICS, P1935
[6]  
Brock H, 2020, IEEE ROMAN, P891, DOI 10.1109/RO-MAN47096.2020.9223566
[7]   Gesture-Based Human-Machine Interaction: Taxonomy, Problem Definition, and Analysis [J].
Carfi, Alessandro ;
Mastrogiovanni, Fulvio .
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (01) :497-513
[8]   LSTM-based real-time action detection and prediction in human motion streams [J].
Carrara, Fabio ;
Elias, Petr ;
Sedmidubsky, Jan ;
Zezula, Pavel .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) :27309-27331
[9]   Deictic gestures and symbolic gestures produced by adults in an experimental context: Hand shapes and hand preferences [J].
Cochet, Helene ;
Vauclair, Jacques .
LATERALITY, 2014, 19 (03) :278-301
[10]  
Craig T. L., 2016, PARTIAL BENDERS DECO, P1