An End-to-End Spatial-Temporal Transformer Model for Surgical Action Triplet Recognition

被引:0
|
作者
Zou, Xiaoyang [1 ]
Yu, Derong [1 ]
Tao, Rong [1 ]
Zheng, Guoyan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Med Robot, Sch Biomed Engn, Dongchuan Rd, Shanghai, Peoples R China
来源
12TH ASIAN-PACIFIC CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING, VOL 2, APCMBE 2023 | 2024年 / 104卷
基金
中国国家自然科学基金;
关键词
Action recognition; Surgical action triplet; Transformer; Self-attention; Auxiliary supervision;
D O I
10.1007/978-3-031-51485-2_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surgical activity recognition plays an important role in computer assisted surgery. Recently, surgical action triplet has become the representative definition of fine-grained surgical activity, which is a combination of three components in the form of <instrument, verb and target>. In this work, we propose an end-to-end spatial-temporal transformer model trained with multi-task auxiliary supervisions, establishing a powerful baseline for surgical action triplet recognition. Rigorous experiments are conducted on a publicly available dataset CholecT45 for ablation studies and comparisons with state-of-the-arts. Experimental results show that our method outperforms state-of-the-arts by 6.8%, achieving 36.5% mAP for triplet recognition. Our method won the 2nd place in action triplet recognition racing track of CholecTriplet 2022 Challenge, which also demonstrates the superior capability of our method.
引用
收藏
页码:114 / 120
页数:7
相关论文
共 50 条
  • [1] Spatial-temporal transformer for end-to-end sign language recognition
    Cui, Zhenchao
    Zhang, Wenbo
    Li, Zhaoxin
    Wang, Zhaoqi
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 4645 - 4656
  • [2] Spatial–temporal transformer for end-to-end sign language recognition
    Zhenchao Cui
    Wenbo Zhang
    Zhaoxin Li
    Zhaoqi Wang
    Complex & Intelligent Systems, 2023, 9 : 4645 - 4656
  • [3] Study and Generalization on an End-to-End Spatial-temporal Driving Model
    Yao, Tingting
    Chen, Xin
    Yuan, Sheng
    Wang, Huaying
    Guo, Lili
    Tian, Bin
    Ai, Yunfeng
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4755 - 4760
  • [4] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [5] A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
    Sui, Lin
    Zhang, Chen-Lin
    Gu, Lixin
    Han, Feng
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5988 - 5997
  • [6] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [7] End-to-end Flow Correlation Tracking with Spatial-temporal Attention
    Zhu, Zheng
    Wu, Wei
    Zou, Wei
    Yan, Junjie
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 548 - 557
  • [8] End-to-end temporal attention extraction and human action recognition
    Zhang, Hong
    Xin, Miao
    Wang, Shuhang
    Yang, Yifan
    Zhang, Lei
    Wang, Helong
    MACHINE VISION AND APPLICATIONS, 2018, 29 (07) : 1127 - 1142
  • [9] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [10] End-to-end temporal attention extraction and human action recognition
    Hong Zhang
    Miao Xin
    Shuhang Wang
    Yifan Yang
    Lei Zhang
    Helong Wang
    Machine Vision and Applications, 2018, 29 : 1127 - 1142