Sports Video Analysis on Large-Scale Data

被引：6

作者：

Wu, Dekun ^{[1
]}

Zhao, He ^{[2
]}

Bao, Xingce ^{[3
]}

Wildes, Richard P. ^{[2
]}

机构：

[1] Univ Pittsburgh, Pittsburgh, PA 15260 USA

[2] York Univ, Toronto, ON, Canada

[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

COMPUTER VISION, ECCV 2022, PT XXXVII | 2022年 / 13697卷

关键词：

D O I：

10.1007/978-3-031-19836-6_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications; (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results; (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. We also design a unified approach to process raw videos into a stack of meaningful features with minimum labelling efforts, showing that cross modeling on such features using a transformer architecture leads to strong performance. In addition, we demonstrate the broad application of NSVA by addressing two additional tasks, namely fine-grained sports action recognition and salient player identification.

引用

页码：19 / 36

页数：18

共 64 条

[51] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [J].

Wang, Limin ;

Xiong, Yuanjun ;

Wang, Zhe ;

Qiao, Yu ;

Lin, Dahua ;

Tang, Xiaoou ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :20-36

[52] Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification [J].

Xie, Saining ;

Sun, Chen ;

Huang, Jonathan ;

Tu, Zhuowen ;

Murphy, Kevin .

COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :318-335

[53] Discriminatively Embedded K-Means for Multi-view Clustering [J].

Xu, Jinglin ;

Han, Junwei ;

Nie, Feiping .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5356-5364

[54] Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning [J].

Yan, Yichao ;

Zhuang, Ning ;

Ni, Bingbing ;

Zhang, Jian ;

Xu, Minghao ;

Zhang, Qiang ;

Zheng, Zhang ;

Cheng, Shuo ;

Tian, Qi ;

Xu, Yi ;

Yang, Xiaokang ;

Zhang, Wenjun .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) :666-683

[55] Describing Videos by Exploiting Temporal Structure [J].

Yao, Li ;

Torabi, Atousa ;

Cho, Kyunghyun ;

Ballas, Nicolas ;

Pal, Christopher ;

Larochelle, Hugo ;

Courville, Aaron .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4507-4515

[56]

Yehao L., 2021, P MM

[57] Fine-grained Video Captioning for Sports Narrative [J].

Yu, Huanyu ;

Cheng, Shuo ;

Ni, Bingbing ;

Wang, Minsi ;

Zhang, Jian ;

Yang, Xiaokang .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6006-6015

[58] Temporal Query Networks for Fine-grained Video Understanding [J].

Zhang, Chuhan ;

Gupta, Ankush ;

Zisserman, Andrew .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4484-4494

[59] Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning [J].

Zhang, Junchao ;

Peng, Yuxin .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8319-8328

[60]

Zhang Z., 2020, P CVPR

← 1 2 3 4 5 6 7 →