Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction

被引:22
作者
Park, Junyong [1 ]
Kim, Daekyum [1 ,2 ]
Huh, Sejoon [1 ]
Jo, Sungho [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon 34141, South Korea
[2] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
新加坡国家研究基金会;
关键词
Action segmentation; Temporal segmentation; Video understanding;
D O I
10.1016/j.patcog.2022.108764
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action segmentation aims to split videos into segments of different actions. Recent work focuses on dealing with long-range dependencies of long, untrimmed videos, but still suffers from over-segmentation and performance saturation due to increased model complexity. This paper addresses the aforementioned issues through a divide-and-conquer strategy that first maximizes the frame-wise classification accuracy of the model and then reduces the over-segmentation errors. This strategy is implemented with the Dilation Passing and Reconstruction Network, composed of the Dilation Passing Network, which primarily aims to increase accuracy by propagating information of different dilations, and the Temporal Reconstruction Network, which reduces over-segmentation errors by temporally encoding and decoding the output features from the Dilation Passing Network. We also propose a weighted temporal mean squared error loss that further reduces over-segmentation. Through evaluations on the 50Salads, GTEA, and Breakfast datasets, we show that our model achieves significant results compared to existing state-of-the-art models. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 30 条
[1]  
Ahn Hyemin, 2021, P IEEE INT C COMPUTE, P16302
[2]  
[Anonymous], 2016, NEURAL INFORM PROCES
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]  
Chen MH, 2020, IEEE WINT CONF APPL, P594, DOI [10.1109/WACV45572.2020.9093535, 10.1109/wacv45572.2020.9093535]
[5]   Motion segment decomposition of RGB-D sequences for human behavior understanding [J].
Devanne, Maxime ;
Berretti, Stefano ;
Pala, Pietro ;
Wannous, Hazem ;
Daoudi, Mohamed ;
Del Bimbo, Alberto .
PATTERN RECOGNITION, 2017, 61 :222-233
[6]   Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment [J].
Ding, Li ;
Xu, Chenliang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6508-6516
[7]   Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate [J].
Doshi, Keval ;
Yilmaz, Yasin .
PATTERN RECOGNITION, 2021, 114
[8]  
Farha Y. A., 2019, PROC CVPR IEEE, P3575, DOI DOI 10.1109/CVPR.2019.00369
[9]  
Fathi A, 2011, PROC CVPR IEEE
[10]   Fine-grained action segmentation using the semi-supervised action GAN [J].
Gammulle, Harshala ;
Denman, Simon ;
Sridharan, Sridha ;
Fookes, Clinton .
PATTERN RECOGNITION, 2020, 98