PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [41] A Novel Driver Distraction Behavior Detection Method Based on Self-Supervised Learning With Masked Image Modeling
    Zhang, Yingzhi
    Li, Taiguo
    Li, Chao
    Zhou, Xinghong
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 6056 - 6071
  • [42] AnoViT: Unsupervised Anomaly Detection and Localization With Vision Transformer-Based Encoder-Decoder
    Lee, Yunseung
    Kang, Pilsung
    IEEE ACCESS, 2022, 10 : 46717 - 46724
  • [43] Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
    Tian, Yu
    Pang, Guansong
    Liu, Fengbei
    Liu, Yuyuan
    Wang, Chong
    Chen, Yuanhong
    Verjans, Johan
    Carneiro, Gustavo
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III, 2022, 13433 : 88 - 98
  • [44] MaxCerVixT: A novel lightweight vision transformer-based Approach for precise cervical cancer detection
    Pacal, Ishak
    KNOWLEDGE-BASED SYSTEMS, 2024, 289
  • [45] MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
    Car, Pengfei
    Song, Yan
    Li, Kang
    Song, Haoyu
    McLoughlin, Ian
    INTERSPEECH 2024, 2024, : 557 - 561
  • [46] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
    Amini, Arash
    Periyasamy, Arul Selvam
    Behnke, Sven
    INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
  • [47] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
    Divya Nimma
    Zhaoxian Zhou
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 1767 - 1778
  • [48] YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
    Wu, Yiheng
    Li, Jianjun
    SENSORS, 2023, 23 (05)
  • [49] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
    Nimma, Divya
    Zhou, Zhaoxian
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1767 - 1778
  • [50] RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection
    Phaphuangwittayakul, Aniwat
    Harnpornchai, Napat
    Ying, Fangli
    Zhang, Jinming
    JOURNAL OF IMAGING, 2024, 10 (08)