PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [31] Where in the World Is This Image? Transformer-Based Geo-localization in the Wild
    Pramanick, Shraman
    Nowara, Ewa M.
    Gleason, Joshua
    Castillo, Carlos D.
    Chellappa, Rama
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 196 - 215
  • [32] Scale-aware token-matching for transformer-based object detector
    Jung, Aecheon
    Hong, Sungeun
    Hyun, Yoonsuk
    PATTERN RECOGNITION LETTERS, 2024, 185 : 197 - 202
  • [33] MAFPN: a mixed local-global attention feature pyramid network for aerial object detection
    Ma, Tengfei
    Yin, Haitao
    REMOTE SENSING LETTERS, 2024, 15 (09) : 907 - 918
  • [34] Vision transformer-based autonomous crack detection on asphalt and concrete surfaces
    Shamsabadi, Elyas Asadi
    Xu, Chang
    Rao, Aravinda S.
    Nguyen, Tuan
    Ngo, Tuan
    Dias-da-Costa, Daniel
    AUTOMATION IN CONSTRUCTION, 2022, 140
  • [35] PSVT: Pyramid Shifted Window based Vision Transformer for cardiac image segmentation
    Zhang, Xingyu
    Liu, Jiacheng
    Xian, Xiaoli
    Chen, Bo
    Li, Dong
    Yang, Fei
    Zhang, Lei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [36] TransPath: Transformer-Based Self-supervised Learning for Histopathological Image Classification
    Wang, Xiyue
    Yang, Sen
    Zhang, Jun
    Wang, Minghui
    Zhang, Jing
    Huang, Junzhou
    Yang, Wei
    Han, Xiao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 186 - 195
  • [37] PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction
    Gan, Weijie
    Zhai, Qiuchen
    Mccann, Michael T.
    Cardona, Cristina Garcia
    Kamilov, Ulugbek S.
    Wohlberg, Brendt
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 539 - 547
  • [38] FBDPN: CNN-Transformer hybrid feature boosting and differential pyramid network for underwater object detection
    Ji, Xun
    Chen, Shijie
    Hao, Li-Ying
    Zhou, Jingchun
    Chen, Long
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256
  • [39] autoSMIM: Automatic Superpixel-Based Masked Image Modeling for Skin Lesion Segmentation
    Wang, Zhonghua
    Lyu, Junyan
    Tang, Xiaoying
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (12) : 3501 - 3511
  • [40] MOODv2: Masked Image Modeling for Out-of-Distribution Detection
    Li, Jingyao
    Chen, Pengguang
    Yu, Shaozuo
    Liu, Shu
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8994 - 9003