D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

被引:82
作者
Chang, Chien-Yi [1 ]
Huang, De-An [1 ]
Sui, Yanan [1 ]
Li Fei-Fei [1 ]
Niebles, Juan Carlos [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00366
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address weakly supervised action alignment and segmentation in videos, where only the order of occurring actions is available during training. We propose Discriminative Differentiable Dynamic Time Warping ((DTW)-T-3), the first discriminative model using weak ordering supervision. The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable. We address this challenge with a continuous relaxation of the min-operator in dynamic programming and extend the alignment loss to be differentiable. The proposed (DTW)-T-3 innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks. We show that our model is able to bypass the degenerated sequence problem usually encountered in previous work and outperform the current state-of-the-art across three evaluation metrics in two challenging datasets.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 41 条
  • [1] Unsupervised Learning from Narrated Instruction Videos
    Alayrac, Jean-Baptiste
    Bojanowski, Piotr
    Agrawal, Nishant
    Sivic, Josef
    Laptev, Ivan
    Lacoste-Julien, Simon
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4575 - 4583
  • [2] [Anonymous], CVPR
  • [3] [Anonymous], 2009, ICCV
  • [4] Bojanowski P., 2015, ICCV
  • [5] Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
  • [6] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
  • [7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [8] Cuturi M, 2017, PR MACH LEARN RES, V70
  • [9] Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
    Damen, Dima
    Doughty, Hazel
    Farinella, Giovanni Maria
    Fidler, Sanja
    Furnari, Antonino
    Kazakos, Evangelos
    Moltisanti, Davide
    Munro, Jonathan
    Perrett, Toby
    Price, Will
    Wray, Michael
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 753 - 771
  • [10] Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
    Ding, Li
    Xu, Chenliang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6508 - 6516