D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

被引：94

作者：

Chang, Chien-Yi ^{[1
]}

Huang, De-An ^{[1
]}

Sui, Yanan ^{[1
]}

Li Fei-Fei ^{[1
]}

Niebles, Juan Carlos ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00366

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address weakly supervised action alignment and segmentation in videos, where only the order of occurring actions is available during training. We propose Discriminative Differentiable Dynamic Time Warping ((DTW)-T-3), the first discriminative model using weak ordering supervision. The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable. We address this challenge with a continuous relaxation of the min-operator in dynamic programming and extend the alignment loss to be differentiable. The proposed (DTW)-T-3 innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks. We show that our model is able to bypass the degenerated sequence problem usually encountered in previous work and outperform the current state-of-the-art across three evaluation metrics in two challenging datasets.

引用

页码：3541 / 3550

页数：10

共 41 条

[1] Unsupervised Learning from Narrated Instruction Videos [J].

Alayrac, Jean-Baptiste ;

Bojanowski, Piotr ;

Agrawal, Nishant ;

Sivic, Josef ;

Laptev, Ivan ;

Lacoste-Julien, Simon .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4575-4583

[2]

[Anonymous], CVPR

[3]

[Anonymous], 2009, ICCV

[4]

Bojanowski P., 2015, ICCV

[5]

Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41

[6]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[8]

Cuturi M, 2017, PR MACH LEARN RES, V70

[9] Scaling Egocentric Vision: The EPIC-KITCHENS Dataset [J].

Damen, Dima ;

Doughty, Hazel ;

Farinella, Giovanni Maria ;

Fidler, Sanja ;

Furnari, Antonino ;

Kazakos, Evangelos ;

Moltisanti, Davide ;

Munro, Jonathan ;

Perrett, Toby ;

Price, Will ;

Wray, Michael .

COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :753-771

[10] Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment [J].

Ding, Li ;

Xu, Chenliang .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6508-6516

← 1 2 3 4 5 →