The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [31] An Incipient Fault Diagnosis Method Based on Complex Convolutional Self-Attention Autoencoder for Analog Circuits
    Gao, Tianyu
    Yang, Jingli
    Jiang, Shouda
    Li, Ye
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (08) : 9727 - 9736
  • [32] SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer
    Shuai, Xiang
    Wang, Xiao
    Wang, Wei
    Yuan, Xin
    Xu, Xin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 443 - 454
  • [33] Joint Self-Attention for Remote Sensing Image Matching
    Li, Liangzhi
    Han, Ling
    Cao, Hongye
    Hu, Huijuan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [34] Hunt for Unseen Intrusion: Multi-Head Self-Attention Neural Detector
    Seo, Seongyun
    Han, Sungmin
    Park, Janghyeon
    Shim, Shinwoo
    Ryu, Han-Eul
    Cho, Byoungmo
    Lee, Sangkyun
    IEEE ACCESS, 2021, 9 : 129635 - 129647
  • [35] Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    IEEE ACCESS, 2022, 10 : 115732 - 115743
  • [36] EEG-Based Emotion Recognition With Emotion Localization via Hierarchical Self-Attention
    Zhang, Yuzhe
    Liu, Huan
    Zhang, Dalin
    Chen, Xuxu
    Qin, Tao
    Zheng, Qinghua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2458 - 2469
  • [37] Self-Attention Based Sequential Recommendation With Graph Convolutional Networks
    Seng, Dewen
    Wang, Jingchang
    Zhang, Xuefeng
    IEEE ACCESS, 2024, 12 : 32780 - 32787
  • [38] Dual-Aspect Self-Attention Based on Transformer for Remaining Useful Life Prediction
    Zhang, Zhizheng
    Song, Wen
    Li, Qiqiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [39] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [40] Multiscale Temporal Self-Attention and Dynamical Graph Convolution Hybrid Network for EEG-Based Stereogram Recognition
    Shen, Lili
    Sun, Mingyang
    Li, Qunxia
    Li, Beichen
    Pan, Zhaoqing
    Lei, Jianjun
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2022, 30 : 1191 - 1202