ACA-Net: adaptive context-aware network for basketball action recognition

被引:2
作者
Zhang, Yaolei [1 ]
Zhang, Fei [2 ]
Zhou, Yuanli [3 ]
Xu, Xiao [4 ]
机构
[1] Beijing Sport Univ, China Basketball Coll, Beijing, Peoples R China
[2] Hangzhou Normal Univ, Coll Phys Educ, Hangzhou, Zhejiang, Peoples R China
[3] Air Force Early Warning Acad, Radar Noncommissioned Officers Sch, Wuhan, Hubei, Peoples R China
[4] Dalian Univ, Coll Phys Educ, Dalian, Liaoning, Peoples R China
关键词
basketball; action recognition; adaptive context-awareness; long short-term information; space-channel information interaction;
D O I
10.3389/fnbot.2024.1471327
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advancements in intelligent action recognition can be instrumental in developing autonomous robotic systems capable of analyzing complex human activities in real-time, contributing to the growing field of robotics that operates in dynamic environments. The precise recognition of basketball players' actions using artificial intelligence technology can provide valuable assistance and guidance to athletes, coaches, and analysts, and can help referees make fairer decisions during games. However, unlike action recognition in simpler scenarios, the background in basketball is similar and complex, the differences between various actions are subtle, and lighting conditions are inconsistent, making action recognition in basketball a challenging task. To address this problem, an Adaptive Context-Aware Network (ACA-Net) for basketball player action recognition is proposed in this paper. It contains a Long Short-term Adaptive (LSTA) module and a Triplet Spatial-Channel Interaction (TSCI) module to extract effective features at the temporal, spatial, and channel levels. The LSTA module adaptively learns global and local temporal features of the video. The TSCI module enhances the feature representation by learning the interaction features between space and channels. We conducted extensive experiments on the popular basketball action recognition datasets SpaceJam and Basketball-51. The results show that ACA-Net outperforms the current mainstream methods, achieving 89.26% and 92.05% in terms of classification accuracy on the two datasets, respectively. ACA-Net's adaptable architecture also holds potential for real-world applications in autonomous robotics, where accurate recognition of complex human actions in unstructured environments is crucial for tasks such as automated game analysis, player performance evaluation, and enhanced interactive broadcasting experiences.
引用
收藏
页数:15
相关论文
共 47 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]  
Ba J, 2014, ACS SYM SER
[3]  
Babaee Khobdeh S., 2021, J APPL RES INDUST EN, V8, P412, DOI DOI 10.22105/JARIE.2021.276107.1270
[4]   Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets [J].
Belkina, Anna C. ;
Ciccolella, Christopher O. ;
Anno, Rina ;
Halpert, Richard ;
Spidlen, Josef ;
Snyder-Cappione, Jennifer E. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[5]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[6]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[7]  
de Melo WC, 2019, IEEE INT CONF AUTOMA, P554, DOI 10.1109/fg.2019.8756568
[8]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[9]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[10]   Video-Based Emotion Recognition using CNN-RNN and C3D Hybrid Networks [J].
Fan, Yin ;
Lu, Xiangju ;
Li, Dian ;
Liu, Yuanliu .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :445-450