The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [41] SelfGCN: Graph Convolution Network With Self-Attention for Skeleton-Based Action Recognition
    Wu, Zhize
    Sun, Pengpeng
    Chen, Xin
    Tang, Keke
    Xu, Tong
    Zou, Le
    Wang, Xiaofeng
    Tan, Ming
    Cheng, Fan
    Weise, Thomas
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4391 - 4403
  • [42] PointAttentionVLAD: A Two-Stage Self-Attention Network for Point Cloud-Based Place Recognition Task
    Yi, Yanjiang
    Fu, Chuanmao
    Zhang, Weizhe
    Wang, Hongbo
    IEEE ACCESS, 2024, 12 : 65192 - 65201
  • [43] Advancing Face Parsing in Real-World: Synergizing Self-Attention and Self-Distillation
    Han, Seungeun
    Yoon, Hosub
    IEEE ACCESS, 2024, 12 : 29812 - 29823
  • [44] Drop-in efficient self-attention approximation method
    Damien François
    Mathis Saillot
    Jacques Klein
    Tegawendé F. Bissyandé
    Alexander Skupin
    Machine Learning, 2025, 114 (6)
  • [45] ODTC: An online darknet traffic classification model based on multimodal self-attention chaotic mapping features
    Zhai, Jiangtao
    Sun, Haoxiang
    Xu, Chengcheng
    Sun, Wenqian
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (08): : 5056 - 5082
  • [46] An Efficient RGB-D Scene Recognition Method Based on Multi-Information Fusion
    Gong, Wenjuan
    Zhang, Bin
    Li, Xin
    IEEE ACCESS, 2020, 8 : 212351 - 212360
  • [47] Self-Attention Fully Convolutional DenseNets for Automatic Salt Segmentation
    Saad, Omar M.
    Chen, Wei
    Zhang, Fangxue
    Yang, Liuqing
    Zhou, Xu
    Chen, Yangkang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3415 - 3428
  • [48] A Global Self-Attention Memristive Neural Network for Image Restoration
    Zhang, Wenhao
    Xiao, He
    Xie, Dirui
    Zhou, Yue
    Duan, Shukai
    Hu, Xiaofang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2613 - 2624
  • [49] AcFusion: Infrared and Visible Image Fusion Based on Self-Attention and Convolution With Enhanced Information Extraction
    Zhu, Huayi
    Wu, Heshan
    He, Dongmei
    Lan, Rushi
    Liu, Zhenbing
    Pan, Xipeng
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 4155 - 4167
  • [50] Segmentation and recognition of filed sweet pepper based on improved self-attention convolutional neural networks
    Weidong Zhu
    Jun Sun
    Simin Wang
    Kaifeng Yang
    Jifeng Shen
    Xin Zhou
    Multimedia Systems, 2023, 29 : 223 - 234