Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

被引：65

作者：

Dwibedi, Debidatta ^{[1
]}

Aytar, Yusuf ^{[2
]}

Tompson, Jonathan ^{[1
]}

Sermanet, Pierre ^{[1
]}

Zisserman, Andrew ^{[2
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] DeepMind, London, England

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

RECOGNITION;

D O I：

10.1109/CVPR42600.2020.01040

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (similar to 90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet.

引用

页码：10384 / 10393

页数：10

共 51 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] The Visual Centrifuge: Model-Free Layered Video Representations [J].

Alayrac, Jean-Baptiste ;

Carreira, Joao ;

Zisserman, Andrew .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2452-2461

[3] Counting in the Wild [J].

Arteta, Carlos ;

Lempitsky, Victor ;

Zisserman, Andrew .

COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :483-498

[4]

Belongie Serge, 2004, INT WORKSH SPAT COH, P16

[5] Gait recognition using image self-similarity [J].

BenAbdelkader, C ;

Cutler, RG ;

Davis, LS .

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) :572-585

[6]

BenAbdelkader C, 2001, LECT NOTES COMPUT SC, V2091, P284

[7] CrowdNet: A Deep Convolutional Network for Dense Crowd Counting [J].

Boominathan, Lokesh ;

Kruthiventi, Srinivas S. S. ;

Babu, R. Venkatesh .

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :640-644

[8] Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].

Chao, Yu-Wei ;

Vijayanarasimhan, Sudheendra ;

Seybold, Bryan ;

Ross, David A. ;

Deng, Jia ;

Sukthankar, Rahul .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139

[9] Robust real-time periodic motion detection, analysis, and applications [J].

Cutler, R ;

Davis, LS .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :781-796

[10]

Dwibedi D, 2018, IEEE INT C INT ROBOT, P1577

← 1 2 3 4 5 6 →