An Underwater Human-Robot Interaction Using a Visual-Textual Model for Autonomous Underwater Vehicles

被引:6
作者
Zhang, Yongji [1 ]
Jiang, Yu [1 ,2 ]
Qi, Hong [1 ,2 ]
Zhao, Minghao [1 ]
Wang, Yuehang [1 ]
Wang, Kai [1 ]
Wei, Fenglin [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, State Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
autonomous underwater vehicle; underwater human-robot interaction; gesture recognition; visual-textual association;
D O I
10.3390/s23010197
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The marine environment presents a unique set of challenges for human-robot interaction. Communicating with gestures is a common way for interacting between the diver and autonomous underwater vehicles (AUVs). However, underwater gesture recognition is a challenging visual task for AUVs due to light refraction and wavelength color attenuation issues. Current gesture recognition methods classify the whole image directly or locate the hand position first and then classify the hand features. Among these purely visual approaches, textual information is largely ignored. This paper proposes a visual-textual model for underwater hand gesture recognition (VT-UHGR). The VT-UHGR model encodes the underwater diver's image as visual features, the category text as textual features, and generates visual-textual features through multimodal interactions. We guide AUVs to use image-text matching for learning and inference. The proposed method achieves better performance than most existing purely visual methods on the dataset CADDY, demonstrating the effectiveness of using textual patterns for underwater gesture recognition.
引用
收藏
页数:13
相关论文
共 48 条
  • [1] Alayrac JB, 2020, ADV NEUR IN, V33
  • [2] Blizard M.A., 1986, OCEAN OPTICS 8, V637, P2
  • [3] Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
    Cai, Yujun
    Ge, Liuhao
    Cai, Jianfei
    Yuan, Junsong
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 678 - 694
  • [4] Jumping NLP Curves: A Review of Natural Language Processing Research
    Cambria, Erik
    White, Bebo
    [J]. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2014, 9 (02) : 48 - 57
  • [5] Underwater Vision-Based Gesture Recognition: A Robustness Validation for Safe Human-Robot Interaction
    Chavez, Arturo Gomez
    Ranieri, Andrea
    Chiarelia, Davide
    Birk, Andreas
    [J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2021, 28 (03) : 67 - 78
  • [6] CADDY Underwater Stereo-Vision Dataset for Human-Robot Interaction (HRI) in the Context of Diver Activities
    Chavez, Arturo Gomez
    Ranieri, Andrea
    Chiarella, Davide
    Zereik, Enrica
    Babic, Anja
    Birk, Andreas
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2019, 7 (01)
  • [7] Chen Q., 2007, P 2007 IEEE INSTRUME, P1
  • [8] A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition
    Cheng, Xinhua
    Jia, Mengxi
    Wang, Qian
    Zhang, Jian
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6994 - 7004
  • [9] A Novel Gesture-Based Language for Underwater Human-Robot Interaction
    Chiarella, Davide
    Bibuli, Marco
    Bruzzone, Gabriele
    Caccia, Massimo
    Ranieri, Andrea
    Zereik, Enrica
    Marconi, Lucia
    Cutugno, Paola
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2018, 6 (03)
  • [10] Codd-Downey R, 2019, IEEE INT CONF ROBOT, P5746, DOI [10.1109/icra.2019.8793655, 10.1109/ICRA.2019.8793655]