Sports Video Analysis on Large-Scale Data

被引：6

作者：

Wu, Dekun ^{[1
]}

Zhao, He ^{[2
]}

Bao, Xingce ^{[3
]}

Wildes, Richard P. ^{[2
]}

机构：

[1] Univ Pittsburgh, Pittsburgh, PA 15260 USA

[2] York Univ, Toronto, ON, Canada

[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

COMPUTER VISION, ECCV 2022, PT XXXVII | 2022年 / 13697卷

关键词：

D O I：

10.1007/978-3-031-19836-6_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications; (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results; (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. We also design a unified approach to process raw videos into a stack of meaningful features with minimum labelling efforts, showing that cross modeling on such features using a transformer architecture leads to strong performance. In addition, we demonstrate the broad application of NSVA by addressing two additional tasks, namely fine-grained sports action recognition and salient player identification.

引用

页码：19 / 36

页数：18

共 64 条

[1] Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [J].

Aafaq, Nayyer ;

Akhtar, Naveed ;

Liu, Wei ;

Gilani, Syed Zulqarnain ;

Mian, Ajmal .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12479-12488

[2] Video Description: A Survey of Methods, Datasets, and Evaluation Metrics [J].

Aafaq, Nayyer ;

Mian, Ajmal ;

Liu, Wei ;

Gilani, Syed Zulqarnain ;

Shah, Mubarak .

ACM COMPUTING SURVEYS, 2020, 52 (06)

[3]

[Anonymous], 2017, P CVPR

[4]

[Anonymous], 2011, Association for Computational Linguistics

[5]

Banerjee S., 2005, P ACL WORKSH INTR EX, P228

[6]

Bertasius G, 2021, PR MACH LEARN RES, V139

[7]

Bi Jing, 2021, P INT C COMP VIS ICC

[8]

Brown TB, 2020, ADV NEUR IN, V33

[9]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[10] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

← 1 2 3 4 5 6 7 →