Cascaded matching based on detection box area for multi-object tracking

被引：3

作者：

Gu, Songbo ^{[1
]}

Zhang, Miaohui ^{[1
]}

Xiao, Qiyang ^{[1
]}

Shi, Wentao ^{[2
]}

机构：

[1] Henan Univ, Sch Artificial Intelligence, Zhengzhou 450046, Peoples R China

[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Deep learning; Multi-object tracking; Cascaded matching; Detection box;

D O I：

10.1016/j.knosys.2024.112075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the existing tracking-by-detection paradigm, advanced approaches rely on appearance features to establish associations between current detections and trajectories. However, these methods are often plagued by issues such as sluggish tracking performance and suboptimal results, particularly when confronted with the unreliability of the appearance features. Considering these challenges, we propose a novel cascaded matching algorithm called the detection box area-based tracking algorithm (DBAT), which groups the detection boxes by area size and associates detections within each group in a cascaded manner. To enhance the accuracy of grouping, we introduce two crucial components to enhance the quality of detections: the compressed self-decoding module (CSDM) and the task collaboration module (TCM). To acquire more precise location information and augment feature richness, CSDM decomposes the input features into two one-dimensional feature encodings and one two-dimensional feature encoding. Subsequently, these feature encodings perform feature aggregation along both spatial directions to capture long-range dependencies and refine the accuracy of location information. Ultimately, these aggregated features engage with the original features, facilitating information fusion and elevating the overall feature representation. To alleviate potential conflicts between various tasks and bolster task-specific representations, TCM combines disparate receptive fields and decouples features through self-relationship and cross-relationship mappings, thereby concurrently enhancing learning across different tasks. Extensive experiments demonstrate that our proposed method achieves performance comparable to state-of-the-art methods on the MOT17, MOT20 and DanceTrack benchmark tests.

引用

页数：11

共 66 条

[1]

Aharon N, 2022, Arxiv, DOI [arXiv:2206.14651, DOI 10.48550/ARXIV.2206.14651]

[2] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics [J].

Bernardin, Keni ;

Stiefelhagen, Rainer .

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)

[3]

Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003

[4]

Bochinski E, 2017, 2017 14TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS)

[5]

Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]

[6] Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking [J].

Cao, Jinkun ;

Pang, Jiangmiao ;

Weng, Xinshuo ;

Khirodkar, Rawal ;

Kitani, Kris .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :9686-9696

[7] Multiscale feature fusion for surveillance video diagnosis [J].

Chen, Fanglin ;

Wang, Weihang ;

Yang, Huiyuan ;

Pei, Wenjie ;

Lu, Guangming .

KNOWLEDGE-BASED SYSTEMS, 2022, 240

[8]

Cheng YM, 2023, Arxiv, DOI arXiv:2305.06558

[9] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [J].

Chu, Peng ;

Wang, Jiang ;

You, Quanzeng ;

Ling, Haibin ;

Liu, Zicheng .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4859-4869

[10]

Dendorfer P., 2020, arXiv

← 1 2 3 4 5 6 7 →