W-TALC: Weakly-Supervised Temporal Activity Localization and Classification

被引:217
作者
Paul, Sujoy [1 ]
Roy, Sourya [1 ]
Roy-Chowdhury, Amit K. [1 ]
机构
[1] Univ Calif Riverside, Riverside, CA 92521 USA
来源
COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷
关键词
Weakly-supervised; Activity localization; Co-activity similarity loss; ACTION RECOGNITION;
D O I
10.1007/978-3-030-01225-0_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.
引用
收藏
页码:588 / 607
页数:20
相关论文
共 71 条
  • [1] Human Activity Analysis: A Review
    Aggarwal, J. K.
    Ryoo, M. S.
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [2] [Anonymous], 2004, Technical report
  • [3] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
  • [4] What's the Point: Semantic Segmentation with Point Supervision
    Bearman, Amy
    Russakovsky, Olga
    Ferrari, Vittorio
    Fei-Fei, Li
    [J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 549 - 565
  • [5] Weakly Supervised Deep Detection Networks
    Bilen, Hakan
    Vedaldi, Andrea
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2846 - 2854
  • [6] Weakly-Supervised Alignment of Video With Text
    Bojanowski, P.
    Lajugie, R.
    Grave, E.
    Bach, F.
    Laptev, I.
    Ponce, J.
    Schmid, C.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4462 - 4470
  • [7] Finding Actors and Actions in Movies
    Bojanowski, P.
    Bach, F.
    Laptev, I.
    Ponce, J.
    Schmid, C.
    Sivic, J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2280 - 2287
  • [8] Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
  • [9] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [10] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698