Scale-aware token-matching for transformer-based object detector

被引：1

作者：

Jung, Aecheon ^{[1
]}

Hong, Sungeun ^{[1
]}

Hyun, Yoonsuk ^{[2
]}

机构：

[1] Sungkyunkwan Univ, Dept Immers Media Engn, Seoul, South Korea

[2] Inha Univ, Dept Math, Incheon, South Korea

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 185卷

基金：

新加坡国家研究基金会;

关键词：

Vision transformer; Object detection; Small object detection;

D O I：

10.1016/j.patrec.2024.08.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.

引用

页码：197 / 202

页数：6

共 50 条

[41] Object detection using convolutional neural networks and transformer-based models: a review
Shrishti Shah
Jitendra Tembhurne
Journal of Electrical Systems and Information Technology, 10 (1)
[42] Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus
Jiang, Yao
Li, Xin
Fu, Keren
Zhao, Qijun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6647 - 6659
[43] Densely packed object detection with transformer-based head and EM-merger
Xiaojing Zhong
Ni Zhang
Hao Hu
Li Li
Junhua Cen
Qingyao Wu
Service Oriented Computing and Applications, 2023, 17 : 109 - 117
[44] Densely packed object detection with transformer-based head and EM-merger
Zhong, Xiaojing
Zhang, Ni
Hu, Hao
Li, Li
Cen, Junhua
Wu, Qingyao
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2023, 17 (02) : 109 - 117
[45] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
Amini, Arash
Periyasamy, Arul Selvam
Behnke, Sven
INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
[46] An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation
Xu, Xiangkai
Feng, Zhejun
Cao, Changqing
Li, Mengyuan
Wu, Jin
Wu, Zengyan
Shang, Yajie
Ye, Shubing
REMOTE SENSING, 2021, 13 (23)
[47] Vision Transformer-based Real-Time Camouflaged Object Detection System at Edge
Putatunda, Rohan
Khan, Md Azim
Gangopadhyay, Aryya
Wang, Jianwu
Busart, Carl
Erbacher, Robert F.
2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 90 - 97
[48] NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields
Sun, Jiankai
Xu, Yan
Ding, Mingyu
Yi, Hongwei
Wang, Chen
Wang, Jingdong
Zhang, Liangjun
Schwager, Mac
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 5244 - 5250
[49] Towards Transformer-Based Real-Time Object Detection at the Edge: A Benchmarking Study
Samplawski, Colin
Marlin, Benjamin M.
2021 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2021), 2021,
[50] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
Liu, Hao
Ma, Yanni
Wang, Hanyun
Zhang, Chaobo
Guo, Yulan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000

← 1 2 3 4 5 →