PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [21] Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification
    Liu, Jun
    Guo, Haoran
    He, Yile
    Li, Huali
    REMOTE SENSING, 2023, 15 (21)
  • [22] ProxyMatting: Transformer-based image matting via region proxy
    Li, Jide
    Yang, Kequan
    Wu, Yuanchen
    Ye, Xichen
    Yang, Hanqi
    Li, Xiaoqiang
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [23] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
    Wasi, Abdul
    Shiney, O. Jeba
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
  • [24] Space-filling Curves for Modeling Spatial Context in Transformer-based Whole Slide Image Classification
    Erkan, Cihan
    Aksoy, Selim
    MEDICAL IMAGING 2023, 2023, 12471
  • [25] TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
    Chen, Yongbiao
    Zhang, Sheng
    Liu, Fangxin
    Chang, Zhigang
    Ye, Mang
    Qi, Zhengwei
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 127 - 136
  • [26] Transformer-based unsupervised contrastive learning for histopathological image classification
    Wang, Xiyue
    Yang, Sen
    Zhang, Jun
    Wang, Minghui
    Zhang, Jing
    Yang, Wei
    Huang, Junzhou
    Han, Xiao
    MEDICAL IMAGE ANALYSIS, 2022, 81
  • [27] EINet: camouflaged object detection with pyramid vision transformer (vol 31, 053002, 2022)
    Li, Chen
    Jiao, Ge
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
  • [28] Semantic-Constraint Matching for transformer-based weakly supervised object localization
    Cao, Yiwen
    Su, Yukun
    Wang, Wenjun
    Liu, Yanxia
    Wu, Qingyao
    PATTERN RECOGNITION, 2025, 158
  • [29] Vison Transformer-Based Automatic Crack Detection on Dam Surface
    Zhou, Jian
    Zhao, Guochuan
    Li, Yonglong
    WATER, 2024, 16 (10)
  • [30] A transformer-based hierarchical registration framework for multimodality deformable image registration
    Zhao, Yao
    Chen, Xinru
    Mcdonald, Brigid
    Yu, Cenji
    Mohamed, Abdalah S. R.
    Fuller, Clifton D.
    Court, Laurence E.
    Pan, Tinsu
    Wang, He
    Wang, Xin
    Phan, Jack
    Yang, Jinzhong
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 108