Scale-aware token-matching for transformer-based object detector

被引:1
|
作者
Jung, Aecheon [1 ]
Hong, Sungeun [1 ]
Hyun, Yoonsuk [2 ]
机构
[1] Sungkyunkwan Univ, Dept Immers Media Engn, Seoul, South Korea
[2] Inha Univ, Dept Math, Incheon, South Korea
基金
新加坡国家研究基金会;
关键词
Vision transformer; Object detection; Small object detection;
D O I
10.1016/j.patrec.2024.08.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.
引用
收藏
页码:197 / 202
页数:6
相关论文
共 50 条
  • [31] Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
    Zhe Chen
    Jing Zhang
    Yufei Xu
    Dacheng Tao
    International Journal of Computer Vision, 2023, 131 : 2738 - 2756
  • [32] PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION
    Zhang, Cong
    Liu, Tianshan
    Ju, Yakun
    Lam, Kin-Man
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1675 - 1679
  • [33] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [34] Transformer-based few-shot object detection in traffic scenarios
    Erjun Sun
    Di Zhou
    Yan Tian
    Zhaocheng Xu
    Xun Wang
    Applied Intelligence, 2024, 54 : 947 - 958
  • [35] ACT-FRCNN: Progress Towards Transformer-Based Object Detection
    Zulfqar, Sukana
    Elgamal, Zenab
    Zia, Muhammad Azam
    Razzaq, Abdul
    Ullah, Sami
    Dawood, Hussain
    ALGORITHMS, 2024, 17 (11)
  • [36] Transformer-based few-shot object detection in traffic scenarios
    Sun, Erjun
    Zhou, Di
    Tian, Yan
    Xu, Zhaocheng
    Wang, Xun
    APPLIED INTELLIGENCE, 2024, 54 (01) : 947 - 958
  • [37] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [38] QAGA-Net: enhanced vision transformer-based object detection for remote sensing images
    Song, Huaxiang
    Xia, Hanjun
    Wang, Wenhui
    Zhou, Yang
    Liu, Wanbo
    Liu, Qun
    Liu, Jinling
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2025, 18 (01) : 133 - 152
  • [39] Scale-Aware Anchor-Free Object Detection via Curriculum Learning for Remote Sensing Images
    Cai, Wandi
    Zhang, Bo
    Wang, Bin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9946 - 9958
  • [40] Cross-Domain Detection Transformer Based on Spatial-Aware and Semantic-Aware Token Alignment
    Deng, Jinhong
    Zhang, Xiaoyue
    Li, Wen
    Duan, Lixin
    Xu, Dong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5234 - 5245