Connectionist Temporal Modeling for Weakly Supervised Action Labeling

被引:117
作者
Huang, De-An [1 ]
Li Fei-Fei [1 ]
Niebles, Juan Carlos [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷
关键词
D O I
10.1007/978-3-319-46493-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities. This protects the model from distractions of visually inconsistent or degenerated alignments without the need of temporal supervision. We further extend our framework to the semi-supervised case when a few frames are sparsely annotated in a video. With less than 1% of labeled frames per video, our method is able to outperform existing semi-supervised approaches and achieve comparable performance to that of fully supervised approaches.
引用
收藏
页码:137 / 153
页数:17
相关论文
共 51 条
  • [1] SLIC Superpixels Compared to State-of-the-Art Superpixel Methods
    Achanta, Radhakrishna
    Shaji, Appu
    Smith, Kevin
    Lucchi, Aurelien
    Fua, Pascal
    Suesstrunk, Sabine
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) : 2274 - 2281
  • [2] Alayrac Jean-Baptiste, 2015, ARXIV150609215
  • [3] [Anonymous], 2015, CVPR
  • [4] [Anonymous], 2013, CVPR
  • [5] [Anonymous], 2014, CVPR
  • [6] [Anonymous], 2015, ICCV
  • [7] [Anonymous], 2015, THUMOS challenge: Action recognition with a large number of classes
  • [8] [Anonymous], 2009, ICCV
  • [9] [Anonymous], 2009, CVPR
  • [10] Finding Actors and Actions in Movies
    Bojanowski, P.
    Bach, F.
    Laptev, I.
    Ponce, J.
    Schmid, C.
    Sivic, J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2280 - 2287