Transformer-Based Multi-Player Tracking and Skill Recognition Framework for Volleyball Analytics

被引:1
作者
Jiang, Lei [1 ]
Yang, Zhihong [1 ]
Gang, Lei [1 ]
机构
[1] Cent South Univ, Sports, Changsha 410083, Hunan, Peoples R China
关键词
Volleyball; player tracking; deep learning; transformer; recognition; multi-model;
D O I
10.1109/ACCESS.2025.3526775
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Volleyball is a dynamic sport requiring high technical skills and tactical awareness, demanding effective training methods for performance improvement. Traditional training approaches often rely heavily on subjective analysis by coaches, leading to inconsistencies in skill development. The integration of advanced computing technologies has opened new avenues in sports analytics, enabling data-driven methods for performance monitoring and strategy development. Despite advancements in sports like baseball, soccer, and basketball, volleyball has not yet been thoroughly explored for computer-assisted analysis, particularly in player tracking and skill recognition. This study proposes a novel Volleyball Player Tracking and Skill Analytics (VPTSA) framework designed specifically for comprehensive volleyball match analysis. The framework integrates multi-player tracking and action recognition systems, leveraging YOLOv7 for player detection and a memory transformer to maintain spatiotemporal information for accurate tracking. Additionally, a transformer-based action recognition module identifies volleyball maneuvers, providing detailed analytics on player performance and behavior. The proposed methodology effectively overcomes challenges such as occlusions, similar appearances among players, and complex motion patterns, which are prevalent in high-intensity sports like volleyball. Results demonstrate that our model achieves superior performance, with an IDF1 score of 72.6, MOTA of 94.8, and HOTA of 73.7, outperforming state-of-the-art models such as TransTrack and ByteTrack on the SportsMOT dataset. In terms of action recognition, our Volleyball Skill Analytics Network (VSAN) model outperforms existing methods, achieving an individual action mAP of 85.5% and a group action mAP of 90.7% on the Volleyball dataset, demonstrating its efficacy in accurately identifying and classifying volleyball maneuvers.
引用
收藏
页码:8806 / 8824
页数:19
相关论文
共 71 条
[1]  
Aharon N, 2022, Arxiv, DOI [arXiv:2206.14651, DOI 10.48550/ARXIV.2206.14651]
[2]   Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition [J].
Bagautdinov, Timur ;
Alahi, Alexandre ;
Fleuret, Francois ;
Fua, Pascal ;
Savarese, Silvio .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3425-3434
[3]   Boundary Content Graph Neural Network for Temporal Action Proposal Generation [J].
Bai, Yueran ;
Wang, Yingying ;
Tong, Yunhai ;
Yang, Yang ;
Liu, Qiyue ;
Liu, Junhui .
COMPUTER VISION - ECCV 2020, PT XXVIII, 2020, 12373 :121-137
[4]   Towards the development of a quality youth sport experience measure: Understanding participant and stakeholder perspectives [J].
Brown, Denver M. Y. ;
Cairney, John ;
Azimi, Sina ;
Vandenborn, Elizabeth ;
Bruner, Mark W. ;
Tamminen, Katherine A. ;
Kwan, Matthew Y. W. .
PLOS ONE, 2023, 18 (07)
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]  
Chang S., 2021, arXiv
[7]  
Chappa Naga V. S. Raviteja, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P5158, DOI 10.1109/CVPRW59228.2023.00544
[8]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[9]   Non-Local Neural Networks with Grouped Bilinear Attentional Transforms [J].
Chi, Lu ;
Yuan, Zehuan ;
Mu, Yadong ;
Wang, Changhu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11801-11810
[10]   TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [J].
Chu, Peng ;
Wang, Jiang ;
You, Quanzeng ;
Ling, Haibin ;
Liu, Zicheng .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4859-4869