Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII | 2023年 / 14260卷
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [31] The role of action representations in visual object recognition
    Helbig, Hannah Barbara
    Graf, Markus
    Kiefer, Markus
    EXPERIMENTAL BRAIN RESEARCH, 2006, 174 (02) : 221 - 228
  • [32] Omnidirectional Image Stabilization for Visual Object Recognition
    Torii, Akihiko
    Havlena, Michal
    Pajdla, Tomas
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 91 (02) : 157 - 174
  • [33] Visual-Tactile Fusion for Object Recognition
    Liu, Huaping
    Yu, Yuanlong
    Sun, Fuchun
    Gu, Jason
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2017, 14 (02) : 996 - 1008
  • [34] Relating visual to verbal semantic knowledge: the evaluation of object recognition in prosopagnosia
    Barton, Jason J. S.
    Hanif, Hashim
    Ashraf, Sohi
    BRAIN, 2009, 132 : 3456 - 3466
  • [35] The role of surface-based representations of shape in visual object recognition
    Reppa, Irene
    Greville, W. James
    Leek, E. Charles
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2015, 68 (12) : 2351 - 2369
  • [36] Distributed Object Recognition in Visual Sensor Networks
    Paris, Stefano
    Redondi, Alessandro
    Cesana, Matteo
    Tagliasacchi, Marco
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 6701 - 6706
  • [37] Omnidirectional Image Stabilization for Visual Object Recognition
    Akihiko Torii
    Michal Havlena
    Tomáš Pajdla
    International Journal of Computer Vision, 2011, 91 : 157 - 174
  • [38] Exploiting Core Knowledge for Visual Object Recognition
    Schurgin, Mark W.
    Flombaum, Jonathan I.
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2017, 146 (03) : 362 - 375
  • [39] The role of action representations in visual object recognition
    Hannah Barbara Helbig
    Markus Graf
    Markus Kiefer
    Experimental Brain Research, 2006, 174 : 221 - 228
  • [40] WEIGHTED BAG OF VISUAL WORDS FOR OBJECT RECOGNITION
    San Biagio, Marco
    Bazzani, Loris
    Cristani, Marco
    Murino, Vittorio
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2734 - 2738