PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [1] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
    Zhan, Yucheng
    Zhao, Yucheng
    Luo, Chong
    Zhang, Yueyi
    Sun, Xiaoyan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
  • [2] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [3] Object Detection of Road Assets Using Transformer-Based YOLOX with Feature Pyramid Decoder on Thai Highway Panorama
    Panboonyuen, Teerapong
    Thongbai, Sittinun
    Wongweeranimit, Weerachai
    Santitamnont, Phisan
    Suphan, Kittiwan
    Charoenphon, Chaiyut
    INFORMATION, 2022, 13 (01)
  • [4] Transformer-based hierarchical dynamic decoders for salient object detection
    Zheng, Qingping
    Zheng, Ling
    Deng, Jiankang
    Li, Ying
    Shang, Changjing
    Shen, Qiang
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [5] Rotated-DETR: an End-to-End Transformer-based Oriented Object Detector for Aerial Images
    Kim, Jinbeom
    Lee, Giljun
    Kim, Taejune
    Woo, Simon S.
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 1248 - 1255
  • [6] A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes
    Lu, Guanlin
    He, Xiaohui
    Wang, Qiang
    Shao, Faming
    Wang, Hongwei
    Wang, Jinkang
    DRONES, 2022, 6 (08)
  • [7] Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection
    Zhang, Tong
    Zhuang, Yin
    Chen, He
    Chen, Liang
    Wang, Guanqun
    Gao, Peng
    Dong, Hao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5013 - 5025
  • [8] Vision Transformer-based Real-Time Camouflaged Object Detection System at Edge
    Putatunda, Rohan
    Khan, Md Azim
    Gangopadhyay, Aryya
    Wang, Jianwu
    Busart, Carl
    Erbacher, Robert F.
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 90 - 97
  • [9] OPTICAL SATELLITE IMAGE CHANGE DETECTION VIA TRANSFORMER-BASED SIAMESE NETWORK
    Wu, Yang
    Wang, Yuyao
    Li, Yanheng
    Xu, Qizhi
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1436 - 1439
  • [10] Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
    Xie, Tianming
    Zhang, Zhonghao
    Tian, Jing
    Ma, Lihong
    SENSORS, 2022, 22 (22)