Multi-object tracking using context-sensitive enhancement via feature fusion

被引:0
作者
Zhou, Yan [1 ]
Chen, Junyu [1 ]
Wang, Dongli [1 ]
Zhu, Xiaolin [2 ]
机构
[1] Xiangtan Univ, Sch Automat & Elect Informat, Xiangtan 411105, Peoples R China
[2] Xiangtan Univ, Sch Math & Computat Sci, Xiangtan 411105, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-object tracking; Inception convolution; Weighted bidirectional pyramid; Feature fusion; Context-sensitive prediction modules; MULTITARGET TRACKING;
D O I
10.1007/s11042-023-16027-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-object tracking (MOT) is one of the most challenging tasks in the field of computer vision. Most MOT methods generally face the problem of not being able to handle pedestrian features such as size and appearance well, which can easily lead to the problem of missed detection and occlusion. Considering this, an end-to-end multi-target tracking network with feature fusion and feature enhancement is proposed. The network framework integrates feature extraction, object detection, and data association. Using two adjacent frames as input chain nodes, based on Inception convolution as the backbone network, which has special pre-training weights that increase the perceptual domain of the network for multiple targets. In addition, the three-times repetitive overlay weighted bidirectional pyramid structure in the feature fusion module, which can focus more on key features and enhance the adaptability to target deformation. In order to solve the phenomenon of crowding in complex scenes, a context-sensitive prediction modules are added, which contain deeper and wider convolution to enhance the key information between targets. After the above processing, three loss function branches are formed, where the classification branch and the identity branch together form the attention multiplied by the regression branch to ensure the accuracy of regression. In MOT16 and MOT17 dataset experiments, our model MOTA metrics reach 67.9 and 67.7, with frame rates up to 30 FPS on a single GPU, with improved visualization results beyond Chain-Tracker.
引用
收藏
页码:19465 / 19484
页数:20
相关论文
共 61 条
[1]   Multimodal Medical Image Fusion Based on Intuitionistic Fuzzy Sets [J].
Adame, Berhan Oumer ;
Salau, Ayodeji Olalekan ;
Subbanna, Bangi Chinna ;
Tirupal, Talari ;
Sultana, Shaik Fowzia .
PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, :143-146
[2]  
Aharon N., 2022, ARXIV
[3]   Online multi-object tracking: multiple instance based target appearance model [J].
Badal, Tapas ;
Nain, Neeta ;
Ahmed, Mushtaq .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (19) :25199-25221
[4]   Tracking without bells and whistles [J].
Bergmann, Philipp ;
Meinhardt, Tim ;
Leal-Taixe, Laura .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :941-951
[5]  
Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[6]  
Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
[7]   Multi-feature fusion tracking algorithm based on peak-context learning [J].
Bouraffa, Tayssir ;
Feng, Zihang ;
Yan, Liping ;
Xia, Yuanqing ;
Xiao, Bo .
IMAGE AND VISION COMPUTING, 2022, 123
[8]   Learning a Neural Solver for Multiple Object Tracking [J].
Braso, Guillem ;
Leal-Taixe, Laura .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6246-6256
[9]   Grid-based multi-object tracking with Siamese CNN based appearance edge and access region mechanism [J].
Chen, Longtao ;
Lou, Jing ;
Xu, Fenglei ;
Ren, Mingwu .
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) :35333-35351
[10]   TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [J].
Chu, Peng ;
Wang, Jiang ;
You, Quanzeng ;
Ling, Haibin ;
Liu, Zicheng .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4859-4869