PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引：2

作者：

Zhang, Cong ^{[1
]}

Liu, Tianshan ^{[1
]}

Ju, Yakun ^{[1
]}

Lam, Kin-Man ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;

D O I：

10.1109/ICIP49359.2023.10223093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

引用

页码：1675 / 1679

页数：5

共 50 条

[21] Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification
Liu, Jun
Guo, Haoran
He, Yile
Li, Huali
REMOTE SENSING, 2023, 15 (21)
[22] ProxyMatting: Transformer-based image matting via region proxy
Li, Jide
Yang, Kequan
Wu, Yuanchen
Ye, Xichen
Yang, Hanqi
Li, Xiaoqiang
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[23] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
Wasi, Abdul
Shiney, O. Jeba
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
[24] Space-filling Curves for Modeling Spatial Context in Transformer-based Whole Slide Image Classification
Erkan, Cihan
Aksoy, Selim
MEDICAL IMAGING 2023, 2023, 12471
[25] TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Chen, Yongbiao
Zhang, Sheng
Liu, Fangxin
Chang, Zhigang
Ye, Mang
Qi, Zhengwei
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 127 - 136
[26] Transformer-based unsupervised contrastive learning for histopathological image classification
Wang, Xiyue
Yang, Sen
Zhang, Jun
Wang, Minghui
Zhang, Jing
Yang, Wei
Huang, Junzhou
Han, Xiao
MEDICAL IMAGE ANALYSIS, 2022, 81
[27] EINet: camouflaged object detection with pyramid vision transformer (vol 31, 053002, 2022)
Li, Chen
Jiao, Ge
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
[28] Semantic-Constraint Matching for transformer-based weakly supervised object localization
Cao, Yiwen
Su, Yukun
Wang, Wenjun
Liu, Yanxia
Wu, Qingyao
PATTERN RECOGNITION, 2025, 158
[29] Vison Transformer-Based Automatic Crack Detection on Dam Surface
Zhou, Jian
Zhao, Guochuan
Li, Yonglong
WATER, 2024, 16 (10)
[30] A transformer-based hierarchical registration framework for multimodality deformable image registration
Zhao, Yao
Chen, Xinru
Mcdonald, Brigid
Yu, Cenji
Mohamed, Abdalah S. R.
Fuller, Clifton D.
Court, Laurence E.
Pan, Tinsu
Wang, He
Wang, Xin
Phan, Jack
Yang, Jinzhong
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 108

← 1 2 3 4 5 →