Sports Video Analysis on Large-Scale Data

被引:6
作者
Wu, Dekun [1 ]
Zhao, He [2 ]
Bao, Xingce [3 ]
Wildes, Richard P. [2 ]
机构
[1] Univ Pittsburgh, Pittsburgh, PA 15260 USA
[2] York Univ, Toronto, ON, Canada
[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
COMPUTER VISION, ECCV 2022, PT XXXVII | 2022年 / 13697卷
关键词
D O I
10.1007/978-3-031-19836-6_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications; (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results; (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. We also design a unified approach to process raw videos into a stack of meaningful features with minimum labelling efforts, showing that cross modeling on such features using a transformer architecture leads to strong performance. In addition, we demonstrate the broad application of NSVA by addressing two additional tasks, namely fine-grained sports action recognition and salient player identification.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 64 条
[1]   Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [J].
Aafaq, Nayyer ;
Akhtar, Naveed ;
Liu, Wei ;
Gilani, Syed Zulqarnain ;
Mian, Ajmal .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12479-12488
[2]   Video Description: A Survey of Methods, Datasets, and Evaluation Metrics [J].
Aafaq, Nayyer ;
Mian, Ajmal ;
Liu, Wei ;
Gilani, Syed Zulqarnain ;
Shah, Mubarak .
ACM COMPUTING SURVEYS, 2020, 52 (06)
[3]  
[Anonymous], 2017, P CVPR
[4]  
[Anonymous], 2011, Association for Computational Linguistics
[5]  
Banerjee S., 2005, P ACL WORKSH INTR EX, P228
[6]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[7]  
Bi Jing, 2021, P INT C COMP VIS ICC
[8]  
Brown TB, 2020, ADV NEUR IN, V33
[9]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[10]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229