Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
|
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII | 2023年 / 14260卷
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [21] Evaluating Integration Strategies for Visuo-Haptic Object Recognition
    Toprak, Sibel
    Navarro-Guerrero, Nicolas
    Wermter, Stefan
    COGNITIVE COMPUTATION, 2018, 10 (03) : 408 - 425
  • [22] Infant visual attention and object recognition
    Reynolds, Greg D.
    BEHAVIOURAL BRAIN RESEARCH, 2015, 285 : 34 - 43
  • [23] HIERARCHY OF VISUAL FEATURES FOR OBJECT RECOGNITION
    Gupta, Nitin
    Das, Sukhendu
    Chakraborti, Sutanu
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5901 - 5905
  • [24] Visual object recognition in multiple sclerosis
    Laatu, S
    Revonsuo, A
    Hämäläinen, P
    Ojanen, V
    Ruutiainen, J
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 2001, 185 (02) : 77 - 88
  • [25] Learning visual variation for object recognition
    Leksut, Jatuporn Toy
    Zhao, Jiaping
    Itti, Laurent
    IMAGE AND VISION COMPUTING, 2020, 98
  • [26] Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer
    Zong, Daoming
    Ding, Chaoyue
    Li, Baoxiang
    Zhou, Dinghao
    Li, Jiakui
    Zheng, Ken
    Zhou, Qunyan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9596 - 9600
  • [27] Multimodal Object Recognition Using Random Clustering Trees
    Villamizar, M.
    Garrell, A.
    Sanfeliu, A.
    Moreno-Noguer, F.
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 496 - 504
  • [28] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
    Xie, Baijun
    Sidulova, Mariia
    Park, Chung Hyuk
    SENSORS, 2021, 21 (14)
  • [29] Visual, haptic and cross-modal recognition of objects and scenes
    Woods, AT
    Newell, FN
    JOURNAL OF PHYSIOLOGY-PARIS, 2004, 98 (1-3) : 147 - 159
  • [30] Short-term plasticity of visuo-haptic object recognition
    Kassuba, Tanja
    Klinge, Corinna
    Hoelig, Cordula
    Roeder, Brigitte
    Siebner, Hartwig R.
    FRONTIERS IN PSYCHOLOGY, 2014, 5