Scale-aware token-matching for transformer-based object detector

被引：1

作者：

Jung, Aecheon ^{[1
]}

Hong, Sungeun ^{[1
]}

Hyun, Yoonsuk ^{[2
]}

机构：

[1] Sungkyunkwan Univ, Dept Immers Media Engn, Seoul, South Korea

[2] Inha Univ, Dept Math, Incheon, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 185卷

基金：

新加坡国家研究基金会;

关键词：

Vision transformer; Object detection; Small object detection;

D O I：

10.1016/j.patrec.2024.08.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.

引用

页码：197 / 202

页数：6

共 50 条

[31] Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
International Journal of Computer Vision, 2023, 131 : 2738 - 2756
[32] PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION
Zhang, Cong
Liu, Tianshan
Ju, Yakun
Lam, Kin-Man
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1675 - 1679
[33] Transformer-based End-to-End Object Detection in Aerial Images
Vo, Nguyen D.
Le, Nguyen
Ngo, Giang
Doan, Du
Le, Do
Nguyen, Khang
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
[34] Transformer-based few-shot object detection in traffic scenarios
Erjun Sun
Di Zhou
Yan Tian
Zhaocheng Xu
Xun Wang
Applied Intelligence, 2024, 54 : 947 - 958
[35] ACT-FRCNN: Progress Towards Transformer-Based Object Detection
Zulfqar, Sukana
Elgamal, Zenab
Zia, Muhammad Azam
Razzaq, Abdul
Ullah, Sami
Dawood, Hussain
ALGORITHMS, 2024, 17 (11)
[36] Transformer-based few-shot object detection in traffic scenarios
Sun, Erjun
Zhou, Di
Tian, Yan
Xu, Zhaocheng
Wang, Xun
APPLIED INTELLIGENCE, 2024, 54 (01) : 947 - 958
[37] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
Alaba, Simegnew Yihunie
Ball, John E.
IEEE ACCESS, 2024, 12 : 50165 - 50176
[38] QAGA-Net: enhanced vision transformer-based object detection for remote sensing images
Song, Huaxiang
Xia, Hanjun
Wang, Wenhui
Zhou, Yang
Liu, Wanbo
Liu, Qun
Liu, Jinling
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2025, 18 (01) : 133 - 152
[39] Scale-Aware Anchor-Free Object Detection via Curriculum Learning for Remote Sensing Images
Cai, Wandi
Zhang, Bo
Wang, Bin
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9946 - 9958
[40] Cross-Domain Detection Transformer Based on Spatial-Aware and Semantic-Aware Token Alignment
Deng, Jinhong
Zhang, Xiaoyue
Li, Wen
Duan, Lixin
Xu, Dong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5234 - 5245

← 1 2 3 4 5 →