PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引：2

作者：

Zhang, Cong ^{[1
]}

Liu, Tianshan ^{[1
]}

Ju, Yakun ^{[1
]}

Lam, Kin-Man ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;

D O I：

10.1109/ICIP49359.2023.10223093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

引用

页码：1675 / 1679

页数：5

共 50 条

[31] Where in the World Is This Image? Transformer-Based Geo-localization in the Wild
Pramanick, Shraman
Nowara, Ewa M.
Gleason, Joshua
Castillo, Carlos D.
Chellappa, Rama
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 196 - 215
[32] Scale-aware token-matching for transformer-based object detector
Jung, Aecheon
Hong, Sungeun
Hyun, Yoonsuk
PATTERN RECOGNITION LETTERS, 2024, 185 : 197 - 202
[33] MAFPN: a mixed local-global attention feature pyramid network for aerial object detection
Ma, Tengfei
Yin, Haitao
REMOTE SENSING LETTERS, 2024, 15 (09) : 907 - 918
[34] Vision transformer-based autonomous crack detection on asphalt and concrete surfaces
Shamsabadi, Elyas Asadi
Xu, Chang
Rao, Aravinda S.
Nguyen, Tuan
Ngo, Tuan
Dias-da-Costa, Daniel
AUTOMATION IN CONSTRUCTION, 2022, 140
[35] PSVT: Pyramid Shifted Window based Vision Transformer for cardiac image segmentation
Zhang, Xingyu
Liu, Jiacheng
Xian, Xiaoli
Chen, Bo
Li, Dong
Yang, Fei
Zhang, Lei
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
[36] TransPath: Transformer-Based Self-supervised Learning for Histopathological Image Classification
Wang, Xiyue
Yang, Sen
Zhang, Jun
Wang, Minghui
Zhang, Jing
Huang, Junzhou
Yang, Wei
Han, Xiao
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 186 - 195
[37] PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction
Gan, Weijie
Zhai, Qiuchen
Mccann, Michael T.
Cardona, Cristina Garcia
Kamilov, Ulugbek S.
Wohlberg, Brendt
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 539 - 547
[38] FBDPN: CNN-Transformer hybrid feature boosting and differential pyramid network for underwater object detection
Ji, Xun
Chen, Shijie
Hao, Li-Ying
Zhou, Jingchun
Chen, Long
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256
[39] autoSMIM: Automatic Superpixel-Based Masked Image Modeling for Skin Lesion Segmentation
Wang, Zhonghua
Lyu, Junyan
Tang, Xiaoying
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (12) : 3501 - 3511
[40] MOODv2: Masked Image Modeling for Out-of-Distribution Detection
Li, Jingyao
Chen, Pengguang
Yu, Shaozuo
Liu, Shu
Jia, Jiaya
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8994 - 9003

← 1 2 3 4 5 →