Sports Video Analysis on Large-Scale Data

被引:6
作者
Wu, Dekun [1 ]
Zhao, He [2 ]
Bao, Xingce [3 ]
Wildes, Richard P. [2 ]
机构
[1] Univ Pittsburgh, Pittsburgh, PA 15260 USA
[2] York Univ, Toronto, ON, Canada
[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
COMPUTER VISION, ECCV 2022, PT XXXVII | 2022年 / 13697卷
关键词
D O I
10.1007/978-3-031-19836-6_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications; (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results; (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. We also design a unified approach to process raw videos into a stack of meaningful features with minimum labelling efforts, showing that cross modeling on such features using a transformer architecture leads to strong performance. In addition, we demonstrate the broad application of NSVA by addressing two additional tasks, namely fine-grained sports action recognition and salient player identification.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 64 条
[51]   Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [J].
Wang, Limin ;
Xiong, Yuanjun ;
Wang, Zhe ;
Qiao, Yu ;
Lin, Dahua ;
Tang, Xiaoou ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :20-36
[52]   Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification [J].
Xie, Saining ;
Sun, Chen ;
Huang, Jonathan ;
Tu, Zhuowen ;
Murphy, Kevin .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :318-335
[53]   Discriminatively Embedded K-Means for Multi-view Clustering [J].
Xu, Jinglin ;
Han, Junwei ;
Nie, Feiping .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5356-5364
[54]   Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning [J].
Yan, Yichao ;
Zhuang, Ning ;
Ni, Bingbing ;
Zhang, Jian ;
Xu, Minghao ;
Zhang, Qiang ;
Zheng, Zhang ;
Cheng, Shuo ;
Tian, Qi ;
Xu, Yi ;
Yang, Xiaokang ;
Zhang, Wenjun .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) :666-683
[55]   Describing Videos by Exploiting Temporal Structure [J].
Yao, Li ;
Torabi, Atousa ;
Cho, Kyunghyun ;
Ballas, Nicolas ;
Pal, Christopher ;
Larochelle, Hugo ;
Courville, Aaron .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4507-4515
[56]  
Yehao L., 2021, P MM
[57]   Fine-grained Video Captioning for Sports Narrative [J].
Yu, Huanyu ;
Cheng, Shuo ;
Ni, Bingbing ;
Wang, Minsi ;
Zhang, Jian ;
Yang, Xiaokang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6006-6015
[58]   Temporal Query Networks for Fine-grained Video Understanding [J].
Zhang, Chuhan ;
Gupta, Ankush ;
Zisserman, Andrew .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4484-4494
[59]   Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning [J].
Zhang, Junchao ;
Peng, Yuxin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8319-8328
[60]  
Zhang Z., 2020, P CVPR