PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引：2

作者：

Zhang, Cong ^{[1
]}

Liu, Tianshan ^{[1
]}

Ju, Yakun ^{[1
]}

Lam, Kin-Man ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;

D O I：

10.1109/ICIP49359.2023.10223093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

引用

页码：1675 / 1679

页数：5

共 50 条

[41] A Novel Driver Distraction Behavior Detection Method Based on Self-Supervised Learning With Masked Image Modeling
Zhang, Yingzhi
Li, Taiguo
Li, Chao
Zhou, Xinghong
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 6056 - 6071
[42] AnoViT: Unsupervised Anomaly Detection and Localization With Vision Transformer-Based Encoder-Decoder
Lee, Yunseung
Kang, Pilsung
IEEE ACCESS, 2022, 10 : 46717 - 46724
[43] Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
Tian, Yu
Pang, Guansong
Liu, Fengbei
Liu, Yuyuan
Wang, Chong
Chen, Yuanhong
Verjans, Johan
Carneiro, Gustavo
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III, 2022, 13433 : 88 - 98
[44] MaxCerVixT: A novel lightweight vision transformer-based Approach for precise cervical cancer detection
Pacal, Ishak
KNOWLEDGE-BASED SYSTEMS, 2024, 289
[45] MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Car, Pengfei
Song, Yan
Li, Kang
Song, Haoyu
McLoughlin, Ian
INTERSPEECH 2024, 2024, : 557 - 561
[46] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
Amini, Arash
Periyasamy, Arul Selvam
Behnke, Sven
INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
[47] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
Divya Nimma
Zhaoxian Zhou
International Journal of Machine Learning and Cybernetics, 2024, 15 : 1767 - 1778
[48] YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
Wu, Yiheng
Li, Jianjun
SENSORS, 2023, 23 (05)
[49] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
Nimma, Divya
Zhou, Zhaoxian
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1767 - 1778
[50] RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection
Phaphuangwittayakul, Aniwat
Harnpornchai, Napat
Ying, Fangli
Zhang, Jinming
JOURNAL OF IMAGING, 2024, 10 (08)

← 1 2 3 4 5 →