W-TALC: Weakly-Supervised Temporal Activity Localization and Classification

被引:239
作者
Paul, Sujoy [1 ]
Roy, Sourya [1 ]
Roy-Chowdhury, Amit K. [1 ]
机构
[1] Univ Calif Riverside, Riverside, CA 92521 USA
来源
COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷
关键词
Weakly-supervised; Activity localization; Co-activity similarity loss; ACTION RECOGNITION;
D O I
10.1007/978-3-030-01225-0_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.
引用
收藏
页码:588 / 607
页数:20
相关论文
共 71 条
[1]   Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[2]  
[Anonymous], 2004, Technical report
[3]  
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[4]   What's the Point: Semantic Segmentation with Point Supervision [J].
Bearman, Amy ;
Russakovsky, Olga ;
Ferrari, Vittorio ;
Fei-Fei, Li .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565
[5]   Weakly Supervised Deep Detection Networks [J].
Bilen, Hakan ;
Vedaldi, Andrea .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854
[6]   Weakly-Supervised Alignment of Video With Text [J].
Bojanowski, P. ;
Lajugie, R. ;
Grave, E. ;
Bach, F. ;
Laptev, I. ;
Ponce, J. ;
Schmid, C. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4462-4470
[7]   Finding Actors and Actions in Movies [J].
Bojanowski, P. ;
Bach, F. ;
Laptev, I. ;
Ponce, J. ;
Schmid, C. ;
Sivic, J. .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2280-2287
[8]  
Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[9]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[10]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698