Enhanced object recognition from remote sensing images based on hybrid convolution and transformer structure

被引:0
|
作者
Nguyen, Hoanh [1 ]
Ngo, Thanh Quyen [1 ]
Uyen, Hoang Thi Tu [1 ]
Duong, Mien Ka [1 ]
机构
[1] Ind Univ Ho Chi Minh City, Fac Elect Engn Technol, Ho Chi Minh City 700000, Vietnam
关键词
Object detection; Remote sensing images; Depthwise separable convolution; Attention mechanisms; NETWORK;
D O I
10.1007/s12145-025-01751-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Object recognition in remote sensing images presents unique challenges due to the diverse scales, shapes, and distributions of objects, particularly small and complex ones. Existing frameworks, such as RT-DETR, struggle to accurately detect small objects because of their limited ability to extract fine-grained details and integrate multi-scale information. To overcome these challenges, we propose an enhanced object recognition model based on a hybrid convolution and transformer structure. This model improves two critical components of the original RT-DETR by introducing the Multi-Scale Adaptive Attention Module (MSAAM) and the Hybrid Feature Fusion Module (HFFM), specifically designed to enhance feature extraction and integration. The MSAAM strengthens the ResNet backbone by adaptively combining local and global information, ensuring the effective extraction of fine-grained details while emphasizing features critical for small object detection. The HFFM, integrated into the final stages of the neck, employs a dual-branch design to balance fine-grained local detail extraction and large-scale contextual understanding. By employing group convolution, depthwise separable convolution, and attention mechanisms, the HFFM mitigates the loss of fine details caused by downsampling while leveraging the expanded receptive field for broader context understanding. Experimental results demonstrate that the proposed model achieves superior object recognition performance, particularly for small objects, making it well-suited for remote sensing applications.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Convolution and Transformer based hybrid neural network for Road Extraction in Remote Sensing Images
    Liu, Shufan
    Wang, Yang
    Wang, Haoqi
    Xiong, Youqiang
    Liu, Yinfeng
    Xie, Chenxi
    2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 471 - 476
  • [2] Remote Sensing Object Detection Based on Convolution and Swin Transformer
    Jiang, Xuzhao
    Wu, Yonghong
    IEEE ACCESS, 2023, 11 : 38643 - 38656
  • [3] QAGA-Net: enhanced vision transformer-based object detection for remote sensing images
    Song, Huaxiang
    Xia, Hanjun
    Wang, Wenhui
    Zhou, Yang
    Liu, Wanbo
    Liu, Qun
    Liu, Jinling
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2025, 18 (01) : 133 - 152
  • [4] Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images
    Dong, Pengwei
    Wang, Bo
    Cong, Runmin
    Sun, Hai-Han
    Li, Chongyi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
  • [5] Transformer Based Remote Sensing Object Detection With Enhanced Multispectral Feature Extraction
    Zhu, Jiahe
    Chen, Xu
    Zhang, Huan
    Tan, Zelong
    Wang, Shengjin
    Ma, Hongbing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [6] Object Recognition in Remote Sensing Images Based on Modified Backpropagation Neural Network
    Raju, Manthena Narasimha
    Natarajan, Kumaran
    Vasamsetty, Chandra Sekhar
    TRAITEMENT DU SIGNAL, 2021, 38 (02) : 451 - 459
  • [7] MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images
    Chen, Juanjuan
    Hong, Hansheng
    Song, Bin
    Guo, Jie
    Chen, Chen
    Xu, Junjie
    REMOTE SENSING, 2023, 15 (02)
  • [8] HIERARCHICAL REGION BASED CONVOLUTION NEURAL NETWORK FOR MULTISCALE OBJECT DETECTION IN REMOTE SENSING IMAGES
    Li, Qingpeng
    Mou, Lichao
    Jiang, Kaiyu
    Liu, Qingjie
    Wang, Yunhong
    Zhu, Xiao Xiang
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 4355 - 4358
  • [9] RepSViT: An Efficient Vision Transformer Based on Spiking Neural Networks for Object Recognition in Satellite On-Orbit Remote Sensing Images
    Pang, Yanhua
    Yao, Libo
    Luo, Yiping
    Dong, Chengguo
    Kong, Qinglei
    Chen, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [10] Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
    Li, Gongyang
    Bai, Zhen
    Liu, Zhi
    Zhang, Xinpeng
    Ling, Haibin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5257 - 5269