Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

被引:65
作者
Dwibedi, Debidatta [1 ]
Aytar, Yusuf [2 ]
Tompson, Jonathan [1 ]
Sermanet, Pierre [1 ]
Zisserman, Andrew [2 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] DeepMind, London, England
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
RECOGNITION;
D O I
10.1109/CVPR42600.2020.01040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (similar to 90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet.
引用
收藏
页码:10384 / 10393
页数:10
相关论文
共 51 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   The Visual Centrifuge: Model-Free Layered Video Representations [J].
Alayrac, Jean-Baptiste ;
Carreira, Joao ;
Zisserman, Andrew .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2452-2461
[3]   Counting in the Wild [J].
Arteta, Carlos ;
Lempitsky, Victor ;
Zisserman, Andrew .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :483-498
[4]  
Belongie Serge, 2004, INT WORKSH SPAT COH, P16
[5]   Gait recognition using image self-similarity [J].
BenAbdelkader, C ;
Cutler, RG ;
Davis, LS .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) :572-585
[6]  
BenAbdelkader C, 2001, LECT NOTES COMPUT SC, V2091, P284
[7]   CrowdNet: A Deep Convolutional Network for Dense Crowd Counting [J].
Boominathan, Lokesh ;
Kruthiventi, Srinivas S. S. ;
Babu, R. Venkatesh .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :640-644
[8]   Rethinking the Faster R-CNN Architecture for Temporal Action Localization [J].
Chao, Yu-Wei ;
Vijayanarasimhan, Sudheendra ;
Seybold, Bryan ;
Ross, David A. ;
Deng, Jia ;
Sukthankar, Rahul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1130-1139
[9]   Robust real-time periodic motion detection, analysis, and applications [J].
Cutler, R ;
Davis, LS .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :781-796
[10]  
Dwibedi D, 2018, IEEE INT C INT ROBOT, P1577