Transformer-Based Cascade U-shaped Network for Action Segmentation

被引：0

作者：

Bao, Wenxia ^{[1
]}

Lin, An ^{[1
]}

Huang, Hua ^{[2
]}

Yang, Xianjun ^{[3
]}

Chen, Hemu ^{[4
]}

机构：

[1] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China

[2] China Tobacco Zhejiang Ind Co Ltd, Hangzhou, Zhejiang, Peoples R China

[3] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei, Peoples R China

[4] Anhui Med Univ, Affiliated Hosp 1, Hefei, Peoples R China

来源：

2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024 | 2024年

关键词：

Action Segmentation; Transformer; U-net;

D O I：

10.1109/ICIPMC62364.2024.10586708

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Action segmentation requires predicting the action that occurs in each frame of the original video, and existing methods tend to focus on the global relationship of the sequence, ignoring the contextual information at different granularities. To address this problem, this paper proposes a Transformer-based cascaded U-network for action segmentation. The proposed method adopts a cascaded transformer structure, where the feature sequences between the encoder-decoder are connected in a U-shape, which fully combines the global context information as well as the local context information between neighboring frames. The extended temporal convolution as well as the local window attention mechanism are used to enhance the model's ability to perceive long-range action interactions. The proposed method outperforms the current mainstream action segmentation methods on two challenging datasets 50 salads and GTEA.

引用

页码：157 / 161

页数：5

共 19 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Abu Farha, Yazan
Gall, Juergen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[3] Do We Really Need Temporal Convolutions in Action Segmentation?
Du, Dazhao
Su, Bing
Li, Yu
Qi, Zhongang
Si, Lingyu
Shan, Ying
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1014 - 1019
[4] Fathi A, 2011, PROC CVPR IEEE
[5] Global2Local: Efficient Structure Search for Video Action Segmentation
Gao, Shang-Hua
Han, Qi
Li, Zhong-Yu
Peng, Pai
Wang, Liang
Cheng, Ming-Ming
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16800 - 16809
[6] Alleviating Over-segmentation Errors by Detecting Action Boundaries
Ishikawa, Yuchi
Kasai, Seito
Aoki, Yoshimitsu
Kataoka, Hirokatsu
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2321 - 2330
[7] Kaur Rupinder, 2014, Journal of Image and Graphics, P106
[8] Temporal Convolutional Networks for Action Segmentation and Detection
Lea, Colin
Flynn, Michael D.
Vidal, Rene
Reiter, Austin
Hager, Gregory D.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1003 - 1012
[9] MS-TCN plus plus : Multi-Stage Temporal Convolutional Network for Action Segmentation
Li, Shijie
Abu Farha, Yazan
Liu, Yun
Cheng, Ming-Ming
Gall, Juergen
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6647 - 6658
[10] Rizzi A, 2013, Journal of Image and Graphics, V5, P157, DOI 10.12720/joig.1.3.157-160

← 1 2 →