The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [1] Finger Vein Recognition Based on ResNet With Self-Attention
    Zhang, Zhibo
    Chen, Guanghua
    Zhang, Weifeng
    Wang, Huiyang
    IEEE ACCESS, 2024, 12 : 1943 - 1951
  • [2] An Intelligent Point Cloud Recognition Method for Substation Equipment Based on Multiscale Self-Attention
    Shen, Xiaojun
    Xu, Zelin
    Wang, Mei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [3] ESAformer: Enhanced Self-Attention for Automatic Speech Recognition
    Li, Junhua
    Duan, Zhikui
    Li, Shiren
    Yu, Xinmei
    Yang, Guangguang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 471 - 475
  • [4] Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition
    Wu, Qingtian
    Zhang, Yu
    Zhang, Liming
    Yu, Haoyong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 358 - 368
  • [5] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
    Zhang, Liang
    Li, Yang
    Wang, Yanhua
    Wang, Junfu
    Long, Teng
    IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898
  • [6] Depth Privileged Scene Recognition via Dual Attention Hallucination
    Chen, Junjie
    Niu, Li
    Zhang, Liqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9164 - 9178
  • [7] Recognition of piglet postures based on self-attention mechanism and anchor-free method
    Xu C.
    Xue Y.
    Zheng C.
    Hou W.
    Guo J.
    Wang X.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (14): : 166 - 173
  • [8] Masked Face Recognition With Mask Transfer and Self-Attention Under the COVID-19 Pandemic
    Zhang, Meng
    Liu, Rujie
    Deguchi, Daisuke
    Murase, Hiroshi
    IEEE ACCESS, 2022, 10 : 20527 - 20538
  • [9] A Coarse-to-Fine Facial Landmark Detection Method Based on Self-attention Mechanism
    Gao, Pengcheng
    Lu, Ke
    Xue, Jian
    Shao, Ling
    Lyu, Jiayi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 926 - 938
  • [10] CATFace: Cross-Attribute-Guided Transformer With Self-Attention Distillation for Low-Quality Face Recognition
    Alipour Talemi, Niloufar
    Kashiani, Hossein
    Nasrabadi, Nasser M.
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2024, 6 (01): : 132 - 146