Transformer-Based Cascade U-shaped Network for Action Segmentation

被引:0
作者
Bao, Wenxia [1 ]
Lin, An [1 ]
Huang, Hua [2 ]
Yang, Xianjun [3 ]
Chen, Hemu [4 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei, Peoples R China
[2] China Tobacco Zhejiang Ind Co Ltd, Hangzhou, Zhejiang, Peoples R China
[3] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei, Peoples R China
[4] Anhui Med Univ, Affiliated Hosp 1, Hefei, Peoples R China
来源
2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024 | 2024年
关键词
Action Segmentation; Transformer; U-net;
D O I
10.1109/ICIPMC62364.2024.10586708
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Action segmentation requires predicting the action that occurs in each frame of the original video, and existing methods tend to focus on the global relationship of the sequence, ignoring the contextual information at different granularities. To address this problem, this paper proposes a Transformer-based cascaded U-network for action segmentation. The proposed method adopts a cascaded transformer structure, where the feature sequences between the encoder-decoder are connected in a U-shape, which fully combines the global context information as well as the local context information between neighboring frames. The extended temporal convolution as well as the local window attention mechanism are used to enhance the model's ability to perceive long-range action interactions. The proposed method outperforms the current mainstream action segmentation methods on two challenging datasets 50 salads and GTEA.
引用
收藏
页码:157 / 161
页数:5
相关论文
共 19 条
  • [1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [3] Do We Really Need Temporal Convolutions in Action Segmentation?
    Du, Dazhao
    Su, Bing
    Li, Yu
    Qi, Zhongang
    Si, Lingyu
    Shan, Ying
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1014 - 1019
  • [4] Fathi A, 2011, PROC CVPR IEEE
  • [5] Global2Local: Efficient Structure Search for Video Action Segmentation
    Gao, Shang-Hua
    Han, Qi
    Li, Zhong-Yu
    Peng, Pai
    Wang, Liang
    Cheng, Ming-Ming
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16800 - 16809
  • [6] Alleviating Over-segmentation Errors by Detecting Action Boundaries
    Ishikawa, Yuchi
    Kasai, Seito
    Aoki, Yoshimitsu
    Kataoka, Hirokatsu
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2321 - 2330
  • [7] Kaur Rupinder, 2014, Journal of Image and Graphics, P106
  • [8] Temporal Convolutional Networks for Action Segmentation and Detection
    Lea, Colin
    Flynn, Michael D.
    Vidal, Rene
    Reiter, Austin
    Hager, Gregory D.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1003 - 1012
  • [9] MS-TCN plus plus : Multi-Stage Temporal Convolutional Network for Action Segmentation
    Li, Shijie
    Abu Farha, Yazan
    Liu, Yun
    Cheng, Ming-Ming
    Gall, Juergen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6647 - 6658
  • [10] Rizzi A, 2013, Journal of Image and Graphics, V5, P157, DOI 10.12720/joig.1.3.157-160