Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
|
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII | 2023年 / 14260卷
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [1] Distinct but related abilities for visual and haptic object recognition
    Chow, Jason K.
    Palmeri, Thomas J.
    Gauthier, Isabel
    PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (05) : 2148 - 2159
  • [2] TVT-Transformer: A Tactile-visual-textual fusion network for object recognition
    Li, Baojiang
    Li, Liang
    Wang, Haiyan
    Chen, Guochu
    Wang, Bin
    Qiu, Shengjie
    INFORMATION FUSION, 2025, 118
  • [3] Simple kinesthetic haptics for object recognition
    Sintov, Avishai
    Meir, Inbar
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (07) : 537 - 561
  • [4] Visuo-haptic transfer for object recognition in children with peripheral visual impairment
    Purpura, Giulia
    Del Magro, Elena Febbrini
    Caputo, Roberto
    Cioni, Giovanni
    Tinelli, Francesca
    VISION RESEARCH, 2021, 178 : 12 - 17
  • [5] A Comparison of the Effects of Depth Rotation on Visual and Haptic Three-Dimensional Object Recognition
    Lawson, Rebecca
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2009, 35 (04) : 911 - 930
  • [6] Husformer: A Multimodal Transformer for Multimodal Human State Recognition
    Wang, Ruiqi
    Jo, Wonse
    Zhao, Dezhong
    Wang, Weizheng
    Gupte, Arjun
    Yang, Baijian
    Chen, Guohua
    Min, Byung-Cheol
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (04) : 1374 - 1390
  • [7] Electromagnetic Imaging Boosted Visual Object Recognition Under Difficult Visual Conditions
    Tan, Min
    Jin, Tao
    Ye, Danhui
    Xu, Kuiwen
    Gu, Xiaoling
    Yu, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] VITO-Transformer: A Visual-Tactile Fusion Network for Object Recognition
    Li, Baojiang
    Bai, Jibo
    Qiu, Shengjie
    Wang, Haiyan
    Guo, Yuting
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [9] Early blindness modulates haptic object recognition
    Leo, Fabrizio
    Gori, Monica
    Sciutti, Alessandra
    FRONTIERS IN HUMAN NEUROSCIENCE, 2022, 16
  • [10] The effects of size changes on haptic object recognition
    Matt Craddock
    Rebecca Lawson
    Attention, Perception, & Psychophysics, 2009, 71 : 910 - 923