Alleviating Over-segmentation Errors by Detecting Action Boundaries

被引:83
作者
Ishikawa, Yuchi [1 ,2 ]
Kasai, Seito [1 ,2 ]
Aoki, Yoshimitsu [2 ]
Kataoka, Hirokatsu [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan
[2] Keio Univ, Tokyo, Japan
来源
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年
关键词
D O I
10.1109/WACV48630.2021.00237
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code is publicly available(1).
引用
收藏
页码:2321 / 2330
页数:10
相关论文
共 49 条
  • [1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [2] Human Activity Analysis: A Review
    Aggarwal, J. K.
    Ryoo, M. S.
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Carreira Joao, 2019, CoRR
  • [5] Chen MH, 2020, IEEE WINT CONF APPL, P594, DOI [10.1109/wacv45572.2020.9093535, 10.1109/WACV45572.2020.9093535]
  • [6] Chen Min-Hung, 2020, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, P9454
  • [7] What is a good evaluation measure for semantic segmentation?
    Csurka, Gabriela
    Larlus, Diane
    Perronnin, Florent
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [8] Connectionist Temporal Modeling for Weakly Supervised Action Labeling
    Huang, De-An
    Li Fei-Fei
    Niebles, Juan Carlos
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 137 - 153
  • [9] Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
    Ding, Li
    Xu, Chenliang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6508 - 6516
  • [10] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497