Similarity- and Quality-Guided Relation Learning for Joint Detection and Tracking

被引:1
|
作者
Feng, Weitao [1 ]
Bai, Lei [2 ]
Yao, Yongqiang [3 ]
Gan, Weihao [3 ]
Wu, Wei [3 ]
Ouyang, Wanli [4 ]
机构
[1] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[3] SenseTime Grp Ltd, Hong Kong, Peoples R China
[4] Univ Sydney, Sch Elect & Informat Engn, Shanghai Artificial Intelligence Lab, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Task analysis; Feature extraction; Videos; Correlation; Semantics; Target tracking; Object detection; Multi-object tracking; joint detection and tracking; similarity- and quality-guided attention; relation learning; instance-level spatial-temporal aggregation; OBJECT TRACKING;
D O I
10.1109/TMM.2023.3279670
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Joint detection and tracking, which solves two fundamental vision challenges in a unified manner, is a challenging topic in computer vision. In this area, the proper use of spatial-temporal information in videos can help reduce local defects and improve the quality of feature representations. Although modeling low-level (usually pixel-wise) spatial-temporal information has been studied, instance-level spatial-temporal correlations (i.e., relations between semantic regions in which instances have occurred) have not been fully exploited. In comparison, modeling instance-level correlation is a more flexible and reasonable way to enhance feature representations. However, we have found that conventional instance-level relation learning that works for the separate tasks of detection or tracking is not effective in joint tasks in which a variety of scenarios may be presented. To try to resolve this problem, in this study, we effectively exploited instance-level spatial-temporal semantic information for joint detection and tracking via a joint relation learning pipeline with a novel relation learning mechanism called Similarity- and Quality-Guided Attention (SQGA). Specifically, we added task-specific SQGA relation modules before the corresponding task prediction heads to refine the instance feature representation using features of other reference instances in the neighboring frames; these features are aggregated on the basis of relational affinities. In particular, in SQGA, relational affinities were factorized to similarity and quality terms so that fine-grained supervision rules could be applied. Then we added task-specific attention losses for each SQGA relation module, resulting in a better feature aggregation for the corresponding task. Quantitative experiments based on several challenging multi-object tracking benchmarks showed that our approach was more effective than the baselines and provided competitive results compared with recent state-of-the-art methods.
引用
收藏
页码:1267 / 1280
页数:14
相关论文
共 9 条
  • [1] Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking
    Liang, Yanjie
    Chen, Haosheng
    Wu, Qiangqiang
    Xia, Changqun
    Li, Jia
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7284 - 7300
  • [2] Quality-guided key frames selection from video stream based on object detection
    Chen, Mingju
    Han, Xiaofeng
    Zhang, Hua
    Lin, Guojun
    Kamruzzaman, M. M.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [3] Real-Time Multiple Pedestrian Tracking With Joint Detection and Embedding Deep Learning Model for Embedded Systems
    Lin, Hung-Wei
    Shivanna, Vinay Malligere
    Chang, Hsiu Chi
    Guo, Jiun-In
    IEEE ACCESS, 2022, 10 : 51458 - 51471
  • [4] Multi-object Tracking by Joint Detection and Identification Learning
    Ke, Bo
    Zheng, Huicheng
    Chen, Lvran
    Yan, Zhiwei
    Li, Ye
    NEURAL PROCESSING LETTERS, 2019, 50 (01) : 283 - 296
  • [5] Multi-object Tracking by Joint Detection and Identification Learning
    Bo Ke
    Huicheng Zheng
    Lvran Chen
    Zhiwei Yan
    Ye Li
    Neural Processing Letters, 2019, 50 : 283 - 296
  • [6] A Joint Detection and Tracking Paradigm Based on Reinforcement Learning for Compact HFSWR
    Li, Xiaotong
    Sun, Weifeng
    Ji, Yonggang
    Huang, Weimin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 1995 - 2009
  • [7] Reinforcement Learning Based Joint Detection and Tracking of Target for Compact HFSWR
    Li, Xiaotong
    Sun, Weifeng
    Ji, Yonggang
    Dai, Yongshou
    Huang, Weimin
    OCEANS 2024 - SINGAPORE, 2024,
  • [8] A Guided Deep Learning Approach for Joint Road Extraction and Intersection Detection From RS Images and Taxi Trajectories
    Li, Yali
    Xiang, Longgang
    Zhang, Caili
    Jiao, Fengwei
    Wu, Chenhao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 8008 - 8018
  • [9] Integration of eye-tracking and object detection in a deep learning system for quality inspection analysis
    Cho, Seung-Wan
    Lim, Yeong-Hyun
    Seo, Kyung-Min
    Kim, Jungin
    JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2024, 11 (03) : 158 - 173