Extraction and Classification of Diving Clips from Continuous Video Footage

被引:16
作者
Nibali, Aiden [1 ]
He, Zhen [1 ]
Morgan, Stuart [1 ,2 ]
Greenwood, Daniel [2 ]
机构
[1] La Trobe Univ, Bundoora, Vic, Australia
[2] Australian Inst Sport, Bruce, Australia
来源
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW) | 2017年
关键词
ACTION RECOGNITION;
D O I
10.1109/CVPRW.2017.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to recent advances in technology, the recording and analysis of video data has become an increasingly common component of athlete training programmes. Today it is incredibly easy and affordable to set up a fixed camera and record athletes in a wide range of sports, such as diving, gymnastics, golf, tennis, etc. However, the manual analysis of the obtained footage is a time-consuming task which involves isolating actions of interest and categorizing them using domain-specific knowledge. In order to automate this kind of task, three challenging sub-problems are often encountered: 1) temporally cropping events/actions of interest from continuous video; 2) tracking the object of interest; and 3) classifying the events/actions of interest. Most previous work has focused on solving just one of the above sub-problems in isolation. In contrast, this paper provides a complete solution to the overall action monitoring task in the context of a challenging real-world exemplar. Specifically, we address the problem of diving classification. This is a challenging problem since the person (diver) of interest typically occupies fewer than 1% of the pixels in each frame. The model is required to learn the temporal boundaries of a dive, even though other divers and bystanders may be in view. Finally, the model must be sensitive to subtle changes in body pose over a large number of frames to determine the classification code. We provide effective solutions to each of the sub-problems which combine to provide a highly functional solution to the task as a whole. The techniques proposed can be easily generalized to video footage recorded from other sports.
引用
收藏
页码:94 / 104
页数:11
相关论文
共 49 条
[11]  
Girshick R., 2014, IEEE C COMP VIS PATT, DOI [DOI 10.1109/CVPR.2014.81, 10.1109/CVPR.2014.81]
[12]  
Gkioxari G, 2015, PROC CVPR IEEE, P759, DOI 10.1109/CVPR.2015.7298676
[13]  
Heng Wang, 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3169, DOI 10.1109/CVPR.2011.5995407
[14]  
Hinton, ADV NEURAL INFORM PR, P2012
[15]  
Hinton G. E., 2012, ABS12070580 CORR
[16]   Human action recognition using genetic algorithms and convolutional neural networks [J].
Ijjina, Earnest Paul ;
Chalavadi, Krishna Mohan .
PATTERN RECOGNITION, 2016, 59 :199-212
[17]  
Ioffe Sergey, 2015, PROC INT C MACH LEAR, V37, P448, DOI DOI 10.48550/ARXIV.1502.03167
[18]   Action localization with tubelets from motion [J].
Jain, Mihir ;
van Gemert, Jan ;
Jegou, Herve ;
Bouthemy, Patrick ;
Snoek, Cees G. M. .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :740-747
[19]   3D Convolutional Neural Networks for Human Action Recognition [J].
Ji, Shuiwang ;
Xu, Wei ;
Yang, Ming ;
Yu, Kai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231
[20]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732