Alleviating Over-segmentation Errors by Detecting Action Boundaries

被引：83

作者：

Ishikawa, Yuchi ^{[1
,2
]}

Kasai, Seito ^{[1
,2
]}

Aoki, Yoshimitsu ^{[2
]}

Kataoka, Hirokatsu ^{[1
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan

[2] Keio Univ, Tokyo, Japan

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00237

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code is publicly available(1).

引用

页码：2321 / 2330

页数：10

共 49 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Abu Farha, Yazan
Gall, Juergen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
[2] Human Activity Analysis: A Review
Aggarwal, J. K.
Ryoo, M. S.
[J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[4] Carreira Joao, 2019, CoRR
[5] Chen MH, 2020, IEEE WINT CONF APPL, P594, DOI [10.1109/wacv45572.2020.9093535, 10.1109/WACV45572.2020.9093535]
[6] Chen Min-Hung, 2020, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, P9454
[7] What is a good evaluation measure for semantic segmentation?
Csurka, Gabriela
Larlus, Diane
Perronnin, Florent
[J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[8] Connectionist Temporal Modeling for Weakly Supervised Action Labeling
Huang, De-An
Li Fei-Fei
Niebles, Juan Carlos
[J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 137 - 153
[9] Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
Ding, Li
Xu, Chenliang
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6508 - 6516
[10] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497

← 1 2 3 4 5 →