An End-to-End Spatial-Temporal Transformer Model for Surgical Action Triplet Recognition

被引：0

作者：

Zou, Xiaoyang ^{[1
]}

Yu, Derong ^{[1
]}

Tao, Rong ^{[1
]}

Zheng, Guoyan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Inst Med Robot, Sch Biomed Engn, Dongchuan Rd, Shanghai, Peoples R China

来源：

12TH ASIAN-PACIFIC CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING, VOL 2, APCMBE 2023 | 2024年 / 104卷

基金：

中国国家自然科学基金;

关键词：

Action recognition; Surgical action triplet; Transformer; Self-attention; Auxiliary supervision;

D O I：

10.1007/978-3-031-51485-2_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Surgical activity recognition plays an important role in computer assisted surgery. Recently, surgical action triplet has become the representative definition of fine-grained surgical activity, which is a combination of three components in the form of <instrument, verb and target>. In this work, we propose an end-to-end spatial-temporal transformer model trained with multi-task auxiliary supervisions, establishing a powerful baseline for surgical action triplet recognition. Rigorous experiments are conducted on a publicly available dataset CholecT45 for ablation studies and comparisons with state-of-the-arts. Experimental results show that our method outperforms state-of-the-arts by 6.8%, achieving 36.5% mAP for triplet recognition. Our method won the 2nd place in action triplet recognition racing track of CholecTriplet 2022 Challenge, which also demonstrates the superior capability of our method.

引用

页码：114 / 120

页数：7

共 50 条

[1] Spatial-temporal transformer for end-to-end sign language recognition
Cui, Zhenchao
Zhang, Wenbo
Li, Zhaoxin
Wang, Zhaoqi
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 4645 - 4656
[2] Spatial–temporal transformer for end-to-end sign language recognition
Zhenchao Cui
Wenbo Zhang
Zhaoxin Li
Zhaoqi Wang
Complex & Intelligent Systems, 2023, 9 : 4645 - 4656
[3] Study and Generalization on an End-to-End Spatial-temporal Driving Model
Yao, Tingting
Chen, Xin
Yuan, Sheng
Wang, Huaying
Guo, Lili
Tian, Bin
Ai, Yunfeng
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4755 - 4760
[4] End-to-End Temporal Action Detection With Transformer
Liu, Xiaolong
Wang, Qimeng
Hu, Yao
Tang, Xu
Zhang, Shiwei
Bai, Song
Bai, Xiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
[5] A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
Sui, Lin
Zhang, Chen-Lin
Gu, Lixin
Han, Feng
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5988 - 5997
[6] End-to-End Video Object Detection with Spatial-Temporal Transformers
He, Lu
Zhou, Qianyu
Li, Xiangtai
Niu, Li
Cheng, Guangliang
Li, Xiao
Liu, Wenxuan
Tong, Yunhai
Ma, Lizhuang
Zhang, Liqing
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
[7] End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Zhu, Zheng
Wu, Wei
Zou, Wei
Yan, Junjie
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 548 - 557
[8] End-to-end temporal attention extraction and human action recognition
Zhang, Hong
Xin, Miao
Wang, Shuhang
Yang, Yifan
Zhang, Lei
Wang, Helong
MACHINE VISION AND APPLICATIONS, 2018, 29 (07) : 1127 - 1142
[9] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
Zhou, Qianyu
Li, Xiangtai
He, Lu
Yang, Yibo
Cheng, Guangliang
Tong, Yunhai
Ma, Lizhuang
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
[10] End-to-end temporal attention extraction and human action recognition
Hong Zhang
Miao Xin
Shuhang Wang
Yifan Yang
Lei Zhang
Helong Wang
Machine Vision and Applications, 2018, 29 : 1127 - 1142

← 1 2 3 4 5 →