The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [21] Relational Reasoning for Group Activity Recognition via Self-Attention Augmented Conditional Random Field
    Pramono, Rizard Renanda Adhi
    Fang, Wen-Hsien
    Chen, Yie-Tarng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8184 - 8199
  • [22] A Self-Attention Dictionary Learning-Based Method for Ship Detection in SAR Images
    Guo, Qian
    Wang, Luwei
    Wang, Liping
    Li, Yong
    Bi, Hui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [23] A lightweight transformer with linear self-attention for defect recognition
    Zhai, Yuwen
    Li, Xinyu
    Gao, Liang
    Gao, Yiping
    ELECTRONICS LETTERS, 2024, 60 (17)
  • [24] GLaLT: Global-Local Attention-Augmented Light Transformer for Scene Text Recognition
    Zhang, Hui
    Luo, Guiyang
    Kang, Jian
    Huang, Shan
    Wang, Xiao
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 10145 - 10158
  • [25] Attention-Based Knowledge Distillation in Scene Recognition: The Impact of a DCT-Driven Loss
    Lopez-Cifuentes, Alejandro
    Escudero-Vinolo, Marcos
    Bescos, Jesus
    San Miguel, Juan C.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4769 - 4783
  • [26] Attention-Based Deep Neural Network Combined Local and Global Features for Indoor Scene Recognition
    Chen, Luefeng
    Duan, Wenhao
    Li, Jiazhuo
    Wu, Min
    Pedrycz, Witold
    Hirota, Kaoru
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (11) : 12684 - 12693
  • [27] Vision Transformer With Enhanced Self-Attention for Few-Shot Ship Target Recognition in Complex Environments
    Tian, Yang
    Meng, Hao
    Yuan, Fei
    Ling, Yue
    Yuan, Ningze
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [28] SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition
    Goswami, Raktim Gautam
    Patel, Naman
    Krishnamurthy, Prashanth
    Khorrami, Farshad
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8242 - 8249
  • [29] Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment
    Song, Ying
    Li, Mengxing
    Wang, Bo
    IEEE ACCESS, 2023, 11 : 49601 - 49610
  • [30] A Multi-Head Self-Attention Transformer-Based Model for Traffic Situation Prediction in Terminal Areas
    Yu, Zhou
    Shi, Xingyu
    Zhang, Zhaoning
    IEEE ACCESS, 2023, 11 : 16156 - 16165