Similarity- and Quality-Guided Relation Learning for Joint Detection and Tracking

被引:1
|
作者
Feng, Weitao [1 ]
Bai, Lei [2 ]
Yao, Yongqiang [3 ]
Gan, Weihao [3 ]
Wu, Wei [3 ]
Ouyang, Wanli [4 ]
机构
[1] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[3] SenseTime Grp Ltd, Hong Kong, Peoples R China
[4] Univ Sydney, Sch Elect & Informat Engn, Shanghai Artificial Intelligence Lab, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Task analysis; Feature extraction; Videos; Correlation; Semantics; Target tracking; Object detection; Multi-object tracking; joint detection and tracking; similarity- and quality-guided attention; relation learning; instance-level spatial-temporal aggregation; OBJECT TRACKING;
D O I
10.1109/TMM.2023.3279670
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Joint detection and tracking, which solves two fundamental vision challenges in a unified manner, is a challenging topic in computer vision. In this area, the proper use of spatial-temporal information in videos can help reduce local defects and improve the quality of feature representations. Although modeling low-level (usually pixel-wise) spatial-temporal information has been studied, instance-level spatial-temporal correlations (i.e., relations between semantic regions in which instances have occurred) have not been fully exploited. In comparison, modeling instance-level correlation is a more flexible and reasonable way to enhance feature representations. However, we have found that conventional instance-level relation learning that works for the separate tasks of detection or tracking is not effective in joint tasks in which a variety of scenarios may be presented. To try to resolve this problem, in this study, we effectively exploited instance-level spatial-temporal semantic information for joint detection and tracking via a joint relation learning pipeline with a novel relation learning mechanism called Similarity- and Quality-Guided Attention (SQGA). Specifically, we added task-specific SQGA relation modules before the corresponding task prediction heads to refine the instance feature representation using features of other reference instances in the neighboring frames; these features are aggregated on the basis of relational affinities. In particular, in SQGA, relational affinities were factorized to similarity and quality terms so that fine-grained supervision rules could be applied. Then we added task-specific attention losses for each SQGA relation module, resulting in a better feature aggregation for the corresponding task. Quantitative experiments based on several challenging multi-object tracking benchmarks showed that our approach was more effective than the baselines and provided competitive results compared with recent state-of-the-art methods.
引用
收藏
页码:1267 / 1280
页数:14
相关论文
共 36 条
  • [1] A quality-guided displacement tracking algorithm for ultrasonic elasticity imaging
    Chen, Lujie
    Treece, Graham M.
    Lindop, Joel E.
    Gee, Andrew H.
    Prager, Richard W.
    MEDICAL IMAGE ANALYSIS, 2009, 13 (02) : 286 - 296
  • [2] Quality-guided lane detection by deeply modeling sophisticated traffic context
    Zhang, Ge
    Yan, Chaokun
    Wang, Jianlin
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84 (84)
  • [3] Quality-Guided Deep Reinforcement Learning for Parameter Tuning in Iterative CT Reconstruction
    Shen, Chenyang
    Tsai, Min-Yu
    Gonzalez, Yesenia
    Chen, Liyuan
    Jiang, Steve B.
    Jia, Xun
    15TH INTERNATIONAL MEETING ON FULLY THREE-DIMENSIONAL IMAGE RECONSTRUCTION IN RADIOLOGY AND NUCLEAR MEDICINE, 2019, 11072
  • [4] Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural networks
    Liu, Jingyi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [5] Quality-Guided Deep Reinforcement Learning for Parameter Tuning in Iterative CT Reconstruction
    Shen, C.
    Gonzalez, Y.
    Chen, L.
    Jiang, S.
    Jia, X.
    MEDICAL PHYSICS, 2019, 46 (06) : E123 - E123
  • [6] Quality-guided key frames selection from video stream based on object detection
    Chen, Mingju
    Han, Xiaofeng
    Zhang, Hua
    Lin, Guojun
    Kamruzzaman, M. M.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [7] Intelligent CT Reconstruction with Simultaneous Learning of Reconstruction Policy and Image Quality Assessment Using Quality-Guided Deep Reinforcement Learning
    Gao, Y.
    Shen, C.
    Tsai, M.
    Gonzalez, Y.
    Chen, L.
    Jia, X.
    MEDICAL PHYSICS, 2021, 48 (06)
  • [8] Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking
    Liang, Yanjie
    Chen, Haosheng
    Wu, Qiangqiang
    Xia, Changqun
    Li, Jia
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7284 - 7300
  • [9] Reinforced Similarity Learning: Siamese Relation Networks for Robust Object Tracking
    Zhang, Dawei
    Zheng, Zhonglong
    Li, Minglu
    He, Xiaowei
    Wang, Tianxiang
    Chen, Liyuan
    Jia, Riheng
    Lin, Feilong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 294 - 303
  • [10] Explicit Spatiotemporal Joint Relation Learning for Tracking Human Pose
    Sun, Xiao
    Li, Chuankang
    Lin, Stephen
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2825 - 2835