W-TALC: Weakly-Supervised Temporal Activity Localization and Classification

被引：239

作者：

Paul, Sujoy ^{[1
]}

Roy, Sourya ^{[1
]}

Roy-Chowdhury, Amit K. ^{[1
]}

机构：

[1] Univ Calif Riverside, Riverside, CA 92521 USA

来源：

COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷

关键词：

Weakly-supervised; Activity localization; Co-activity similarity loss; ACTION RECOGNITION;

D O I：

10.1007/978-3-030-01225-0_35

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.

引用

页码：588 / 607

页数：20

共 71 条

[1] Human Activity Analysis: A Review [J].

Aggarwal, J. K. ;

Ryoo, M. S. .

ACM COMPUTING SURVEYS, 2011, 43 (03)

[2]

[Anonymous], 2004, Technical report

[3]

Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]

[4] What's the Point: Semantic Segmentation with Point Supervision [J].

Bearman, Amy ;

Russakovsky, Olga ;

Ferrari, Vittorio ;

Fei-Fei, Li .

COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565

[5] Weakly Supervised Deep Detection Networks [J].

Bilen, Hakan ;

Vedaldi, Andrea .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854

[6] Weakly-Supervised Alignment of Video With Text [J].

Bojanowski, P. ;

Lajugie, R. ;

Grave, E. ;

Bach, F. ;

Laptev, I. ;

Ponce, J. ;

Schmid, C. .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4462-4470

[7] Finding Actors and Actions in Movies [J].

Bojanowski, P. ;

Bach, F. ;

Laptev, I. ;

Ponce, J. ;

Schmid, C. ;

Sivic, J. .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2280-2287

[8]

Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41

[9] Large-Scale Machine Learning with Stochastic Gradient Descent [J].

Bottou, Leon .

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186

[10]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

← 1 2 3 4 5 6 7 8 →