Player Identification in Hockey Broadcast Videos

被引:15
作者
Chan, Alvin [1 ]
Levine, Martin D. [1 ]
Javan, Mehrsan [1 ,2 ]
机构
[1] McGill Univ, 845 Rue Sherbrooke O, Montreal, PQ H3A 0G4, Canada
[2] Sportlogiq, 5455 Ave Gaspe Suite 570, Montreal, PQ H2T 3B3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Computer vision; Recurrent models; Convolutional neural networks; Sports player identification; Jersey numbers; Broadcast hockey videos;
D O I
10.1016/j.eswa.2020.113891
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a deep recurrent convolutional neural network (CNN) approach to solve the problem of hockey player identification in NHL broadcast videos. Player identification is a difficult computer vision problem mainly because of the players' similar appearance, occlusion, and blurry facial and physical features. However, we can observe players' jersey numbers over time by processing variable length image sequences of players (aka 'tracklets'). We propose an end-to-end trainable ResNet+LSTM network, with a residual network (ResNet) base and a long short-term memory (LSTM) layer, to discover spatio-temporal features of jersey numbers over time and learn long-term dependencies. Additionally, we employ a secondary 1-dimensional convolutional neural network classifier as a late score-level fusion method to classify the output of the ResNet+LSTM network. For this work, we created a new hockey player tracklet dataset that contains sequences of hockey player bounding boxes. This achieves an overall player identification accuracy score over 87% on the test split of our new dataset.
引用
收藏
页数:9
相关论文
共 26 条
[11]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[12]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[13]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[14]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[15]  
Lu C.W., 2013, Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, P113, DOI 10.1145/2499788.2499842
[16]   Robust wide-baseline stereo from maximally stable extremal regions [J].
Matas, J ;
Chum, O ;
Urban, M ;
Pajdla, T .
IMAGE AND VISION COMPUTING, 2004, 22 (10) :761-767
[17]   Scene text recognition and tracking to identify athletes in sport videos [J].
Messelodi, Stefano ;
Modena, Carla Maria .
MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 63 (02) :521-545
[18]  
Netzer Yuval, 2011, P NIPS WORKSH DEEP L
[19]  
Pascanu R., 2013, P 30 INT C MACH LEAR, P1310, DOI DOI 10.48550/ARXIV.1211.5063
[20]  
Rumelhart D. E., 1986, PARALLEL DISTRIBUTED, P318