Video-Based Martial Arts Combat Action Recognition and Position Detection Using Deep Learning

被引：0

作者：

Wu, Baoyuan ^{[1
,2
,3
]}

Zhou, Jiali ^{[4
]}

机构：

[1] Chengdu Sport Univ, Sch Wushu, Chengdu 610093, Peoples R China

[2] Chengdu Sport Univ, Chinese Guoshu Acad, Chengdu 610093, Peoples R China

[3] Adamson Univ, Coll Educ, Manila 1000, Philippines

[4] Sichuan Technol & Business Univ, Sch Phys Educ, Chengdu 611745, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Art; Three-dimensional displays; Location awareness; Feature extraction; Skeleton; Training; Point cloud compression; Image recognition; Face recognition; Computational modeling; Deep learning; vision transformer; event detection; video classification; martial art; NETWORKS;

D O I：

10.1109/ACCESS.2024.3487289

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition in martial arts can offer valuable insights for technicians, athletes, and coaches. Accurate action recognition can enhance performance analysis, inform training strategies, and improve decision-making processes by providing detailed evaluations of technique execution, movement patterns, and match dynamics. This can lead to more effective coaching, better athlete preparation, and a deeper understanding of competitive outcomes. Existing methods in human action recognition often struggle with challenges such as background clutter, occlusion, and variations in appearance and speed, particularly in dynamic combat scenarios. In this study, we proposed a novel Spatio-Temporal Hierarchical Keypoint Aggregation (ST-HKA) framework for martial arts combat action recognition and localization. The ST-HKA model effectively leverages a deep learning-based approach that treats human skeleton keypoints as 3D point clouds. Unlike conventional methods that use graph convolutional networks or appearance-based techniques, our approach adopts a point cloud paradigm to treat human keypoints as a 3D point cloud, significantly improving scalability and robustness against occlusion and variations in appearance. Additionally, we incorporate a weakly supervised spatio-temporal action localization mechanism using a Context-Aware Pooling Mechanism. The proposed model was evaluated on both the Kinetics Human-Action and Taekwondo datasets, demonstrating superior performance in recognizing complex martial arts actions. The ST-HKA model achieves a Top-1 Accuracy of 88.6% and an F1-score of 83.9% on the Kinetics dataset, and 88.7% accuracy and an F1-score of 84.4% on the Taekwondo dataset. The proposed model also exhibits higher precision in detecting precise temporal boundaries, as reflected by its strong performance in action localization tasks. These results highlight the effectiveness of ST-HKA in handling complex martial arts actions with high accuracy and robustness.

引用

页码：161357 / 161374

页数：18

共 50 条

[31] Action Recognition and Detection Based on Deep Learning: A Comprehensive Summary
Li, Yong
Liang, Qiming
Gan, Bo
Cui, Xiaolong
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 1 - 23
[32] Self-Supervised Video-Based Action Recognition With Disturbances
Lin, Wei
Ding, Xinghao
Huang, Yue
Zeng, Huanqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2493 - 2507
[33] A survey of video-based human action recognition in team sports
Yin, Hongwei
Sinnott, Richard O.
Jayaputera, Glenn T.
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (11)
[34] A Deep Learning Framework for Video-Based Vehicle Counting
Lin, Haojia
Yuan, Zhilu
He, Biao
Kuai, Xi
Li, Xiaoming
Guo, Renzhong
FRONTIERS IN PHYSICS, 2022, 10
[35] Video-based skill acquisition assessment in laparoscopic surgery using deep learning
Erim Yanik
Jean Paul Ainam
Yaoyu Fu
Steven Schwaitzberg
Lora Cavuoto
Suvranu De
Global Surgical Education - Journal of the Association for Surgical Education, 3 (1):
[36] A deep learning-based car accident detection approach in video-based traffic surveillance
Wu, Xinyu
Li, Tingting
JOURNAL OF OPTICS-INDIA, 2024, 53 (04): : 3383 - 3391
[37] Soft Spatial Attention-Based Multimodal Driver Action Recognition Using Deep Learning
Jegham, Imen
Ben Khalifa, Anouar
Alouani, Ihsen
Mahjoub, Mohamed Ali
IEEE SENSORS JOURNAL, 2021, 21 (02) : 1918 - 1925
[38] Modeling Video-based Anomaly Detection using Deep Architectures: Challenges and Possibilities
Chong, Yong Shean
Tay, Yong Haur
2015 10TH ASIAN CONTROL CONFERENCE (ASCC), 2015,
[39] Video-based isolated hand sign language recognition using a deep cascaded model
Rastgoo, Razieh
Kiani, Kourosh
Escalera, Sergio
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 22965 - 22987
[40] Enhanced video analysis framework for action detection using deep learning
Begampure, Saylee
Jadhav, Parul
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (02): : 218 - 228

← 1 2 3 4 5 →