Scale-aware token-matching for transformer-based object detector

被引:1
作者
Jung, Aecheon [1 ]
Hong, Sungeun [1 ]
Hyun, Yoonsuk [2 ]
机构
[1] Sungkyunkwan Univ, Dept Immers Media Engn, Seoul, South Korea
[2] Inha Univ, Dept Math, Incheon, South Korea
基金
新加坡国家研究基金会;
关键词
Vision transformer; Object detection; Small object detection;
D O I
10.1016/j.patrec.2024.08.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.
引用
收藏
页码:197 / 202
页数:6
相关论文
共 50 条
  • [41] Object detection using convolutional neural networks and transformer-based models: a review
    Shrishti Shah
    Jitendra Tembhurne
    Journal of Electrical Systems and Information Technology, 10 (1)
  • [42] Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus
    Jiang, Yao
    Li, Xin
    Fu, Keren
    Zhao, Qijun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6647 - 6659
  • [43] Densely packed object detection with transformer-based head and EM-merger
    Xiaojing Zhong
    Ni Zhang
    Hao Hu
    Li Li
    Junhua Cen
    Qingyao Wu
    Service Oriented Computing and Applications, 2023, 17 : 109 - 117
  • [44] Densely packed object detection with transformer-based head and EM-merger
    Zhong, Xiaojing
    Zhang, Ni
    Hu, Hao
    Li, Li
    Cen, Junhua
    Wu, Qingyao
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2023, 17 (02) : 109 - 117
  • [45] YOLOPose: Transformer-Based Multi-object 6D Pose Estimation Using Keypoint Regression
    Amini, Arash
    Periyasamy, Arul Selvam
    Behnke, Sven
    INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 392 - 406
  • [46] An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation
    Xu, Xiangkai
    Feng, Zhejun
    Cao, Changqing
    Li, Mengyuan
    Wu, Jin
    Wu, Zengyan
    Shang, Yajie
    Ye, Shubing
    REMOTE SENSING, 2021, 13 (23)
  • [47] Vision Transformer-based Real-Time Camouflaged Object Detection System at Edge
    Putatunda, Rohan
    Khan, Md Azim
    Gangopadhyay, Aryya
    Wang, Jianwu
    Busart, Carl
    Erbacher, Robert F.
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 90 - 97
  • [48] NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields
    Sun, Jiankai
    Xu, Yan
    Ding, Mingyu
    Yi, Hongwei
    Wang, Chen
    Wang, Jingdong
    Zhang, Liangjun
    Schwager, Mac
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 5244 - 5250
  • [49] Towards Transformer-Based Real-Time Object Detection at the Edge: A Benchmarking Study
    Samplawski, Colin
    Marlin, Benjamin M.
    2021 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2021), 2021,
  • [50] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
    Liu, Hao
    Ma, Yanni
    Wang, Hanyun
    Zhang, Chaobo
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000