TTNet: Real-time temporal and spatial video analysis of table tennis

被引：49

作者：

Voeikov, Roman ^{[1
]}

Falaleev, Nikolay ^{[1
]}

Baikulov, Ruslan ^{[1
]}

机构：

[1] OSAI, Moscow, Russia

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年

关键词：

D O I：

10.1109/CVPRW50498.2020.00450

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a neural network TTNet aimed at real-time processing of high-resolution table tennis videos, providing both temporal (events spotting) and spatial (ball detection and semantic segmentation) data. This approach gives core information for reasoning score updates by an auto-referee system. We also publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events, semantic segmentation masks, and ball coordinates for evaluation of multi-task approaches, primarily oriented on spotting of quick events and small objects tracking. TTNet demonstrated 97.0% accuracy in game events spotting along with 2 pixels RMSE in ball detection with 97.5% accuracy on the test part of the presented dataset. The proposed network allows the processing of down-scaled full HD videos with inference time below 6 ms per input tensor on a machine with a single consumer-grade GPU. Thus, we are contributing to the development of real-time multi-task deep learning applications and presenting approach, which is potentially capable of substituting manual data collection by sports scouts, providing support for referees' decision-making, and gathering extra information about the game process.

引用

页码：3857 / 3865

页数：9

共 22 条

[1]

[Anonymous], 2016, 20 INT C IM PROC COM

[2]

[Anonymous], 2017, LNCS LNAI, DOI DOI 10.1007/978-3

[3]

[Anonymous], 2017, CoRR

[4]

Bridgeman Lewis, 2019, IEEE C COMP VIS PATT

[5] SST: Single-Stream Temporal Action Proposals [J].

Buch, Shyamal ;

Escorcia, Victor ;

Shen, Chuanqi ;

Ghanem, Bernard ;

Niebles, Juan Carlos .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6373-6382

[6]

Cai Zixi, 2018, CORR

[7] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[8] Deep Networks with Stochastic Depth [J].

Huang, Gao ;

Sun, Yu ;

Liu, Zhuang ;

Sedra, Daniel ;

Weinberger, Kilian Q. .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :646-661

[9] A deep learning ball tracking system in soccer videos [J].

Kamble, P. R. ;

Keskar, A. G. ;

Bhurchandi, K. M. .

OPTO-ELECTRONICS REVIEW, 2019, 27 (01) :58-69

[10] Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics [J].

Kendall, Alex ;

Gal, Yarin ;

Cipolla, Roberto .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7482-7491

← 1 2 3 →