Similarity- and Quality-Guided Relation Learning for Joint Detection and Tracking

被引：1

作者：

Feng, Weitao ^{[1
]}

Bai, Lei ^{[2
]}

Yao, Yongqiang ^{[3
]}

Gan, Weihao ^{[3
]}

Wu, Wei ^{[3
]}

Ouyang, Wanli ^{[4
]}

机构：

[1] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia

[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

[3] SenseTime Grp Ltd, Hong Kong, Peoples R China

[4] Univ Sydney, Sch Elect & Informat Engn, Shanghai Artificial Intelligence Lab, Sydney, NSW 2006, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

澳大利亚研究理事会;

关键词：

Task analysis; Feature extraction; Videos; Correlation; Semantics; Target tracking; Object detection; Multi-object tracking; joint detection and tracking; similarity- and quality-guided attention; relation learning; instance-level spatial-temporal aggregation; OBJECT TRACKING;

D O I：

10.1109/TMM.2023.3279670

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Joint detection and tracking, which solves two fundamental vision challenges in a unified manner, is a challenging topic in computer vision. In this area, the proper use of spatial-temporal information in videos can help reduce local defects and improve the quality of feature representations. Although modeling low-level (usually pixel-wise) spatial-temporal information has been studied, instance-level spatial-temporal correlations (i.e., relations between semantic regions in which instances have occurred) have not been fully exploited. In comparison, modeling instance-level correlation is a more flexible and reasonable way to enhance feature representations. However, we have found that conventional instance-level relation learning that works for the separate tasks of detection or tracking is not effective in joint tasks in which a variety of scenarios may be presented. To try to resolve this problem, in this study, we effectively exploited instance-level spatial-temporal semantic information for joint detection and tracking via a joint relation learning pipeline with a novel relation learning mechanism called Similarity- and Quality-Guided Attention (SQGA). Specifically, we added task-specific SQGA relation modules before the corresponding task prediction heads to refine the instance feature representation using features of other reference instances in the neighboring frames; these features are aggregated on the basis of relational affinities. In particular, in SQGA, relational affinities were factorized to similarity and quality terms so that fine-grained supervision rules could be applied. Then we added task-specific attention losses for each SQGA relation module, resulting in a better feature aggregation for the corresponding task. Quantitative experiments based on several challenging multi-object tracking benchmarks showed that our approach was more effective than the baselines and provided competitive results compared with recent state-of-the-art methods.

引用

页码：1267 / 1280

页数：14

共 9 条

[1] Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking
Liang, Yanjie
Chen, Haosheng
Wu, Qiangqiang
Xia, Changqun
Li, Jia
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7284 - 7300
[2] Quality-guided key frames selection from video stream based on object detection
Chen, Mingju
Han, Xiaofeng
Zhang, Hua
Lin, Guojun
Kamruzzaman, M. M.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
[3] Real-Time Multiple Pedestrian Tracking With Joint Detection and Embedding Deep Learning Model for Embedded Systems
Lin, Hung-Wei
Shivanna, Vinay Malligere
Chang, Hsiu Chi
Guo, Jiun-In
IEEE ACCESS, 2022, 10 : 51458 - 51471
[4] Multi-object Tracking by Joint Detection and Identification Learning
Ke, Bo
Zheng, Huicheng
Chen, Lvran
Yan, Zhiwei
Li, Ye
NEURAL PROCESSING LETTERS, 2019, 50 (01) : 283 - 296
[5] Multi-object Tracking by Joint Detection and Identification Learning
Bo Ke
Huicheng Zheng
Lvran Chen
Zhiwei Yan
Ye Li
Neural Processing Letters, 2019, 50 : 283 - 296
[6] A Joint Detection and Tracking Paradigm Based on Reinforcement Learning for Compact HFSWR
Li, Xiaotong
Sun, Weifeng
Ji, Yonggang
Huang, Weimin
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 1995 - 2009
[7] Reinforcement Learning Based Joint Detection and Tracking of Target for Compact HFSWR
Li, Xiaotong
Sun, Weifeng
Ji, Yonggang
Dai, Yongshou
Huang, Weimin
OCEANS 2024 - SINGAPORE, 2024,
[8] A Guided Deep Learning Approach for Joint Road Extraction and Intersection Detection From RS Images and Taxi Trajectories
Li, Yali
Xiang, Longgang
Zhang, Caili
Jiao, Fengwei
Wu, Chenhao
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 8008 - 8018
[9] Integration of eye-tracking and object detection in a deep learning system for quality inspection analysis
Cho, Seung-Wan
Lim, Yeong-Hyun
Seo, Kyung-Min
Kim, Jungin
JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2024, 11 (03) : 158 - 173

← 1 →