MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition

被引:6
作者
Gui, Shuangchun [1 ]
Wang, Zhenkun [1 ,2 ]
Chen, Jixiang [1 ]
Zhou, Xun [3 ]
Zhang, Chen [4 ,5 ]
Cao, Yi [4 ,5 ]
机构
[1] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[4] Southern Univ Sci & Technol, Shenzhen Peoples Hosp 3, Affiliated Hosp 2, Dept Radiol, Shenzhen 518112, Peoples R China
[5] Natl Clin Res Ctr Infect Dis, Shenzhen 518112, Peoples R China
基金
中国国家自然科学基金;
关键词
Surgery; Task analysis; Transformers; Context modeling; Training; Feature extraction; Multitasking; Surgical activity recognition; knowledge distillation; multi-label image classification; WORKFLOW RECOGNITION; PHASE RECOGNITION; SURGICAL VIDEOS;
D O I
10.1109/TMI.2023.3345736
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The recognition of surgical triplets plays a critical role in the practical application of surgical videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while establishing precise associations between them. Existing methods face two significant challenges in triplet recognition: 1) the imbalanced class distribution of surgical triplets may lead to spurious task association learning, and 2) the feature extractors cannot reconcile local and global context modeling. To overcome these challenges, this paper presents a novel multi-teacher knowledge distillation framework for multi-task triplet learning, known as MT4MTL-KD. MT4MTL-KD leverages teacher models trained on less imbalanced sub-tasks to assist multi-task student learning for triplet recognition. Moreover, we adopt different categories of backbones for the teacher and student models, facilitating the integration of local and global context modeling. To further align the semantic knowledge between the triplet task and its sub-tasks, we propose a novel feature attention module (FAM). This module utilizes attention mechanisms to assign multi-task features to specific sub-tasks. We evaluate the performance of MT4MTL-KD on both the 5-fold cross-validation and the CholecTriplet challenge splits of the CholecT45 dataset. The experimental results consistently demonstrate the superiority of our framework over state-of-the-art methods, achieving significant improvements of up to 6.4% on the cross-validation split.
引用
收藏
页码:1628 / 1639
页数:12
相关论文
共 47 条
[1]  
Blum T, 2010, LECT NOTES COMPUT SC, V6363, P400
[2]   Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition [J].
Chen, Tianshui ;
Xu, Muxin ;
Hui, Xiaolu ;
Wu, Hefeng ;
Lin, Liang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :522-531
[3]   Multi-Label Image Recognition with Graph Convolutional Networks [J].
Chen, Zhao-Min ;
Wei, Xiu-Shen ;
Wang, Peng ;
Guo, Yanwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5172-5181
[4]   Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition Through the Lens of Robustness [J].
Cheng, Yanqi ;
Liu, Lihao ;
Wang, Shujun ;
Jin, Yueming ;
Schonlieb, Carola-Bibiane ;
Aviles-Rivero, Angelica, I .
TRUSTWORTHY MACHINE LEARNING FOR HEALTHCARE, TML4H 2023, 2023, 13932 :177-189
[5]   ResLT: Residual Learning for Long-Tailed Recognition [J].
Cui, Jiequan ;
Liu, Shu ;
Tian, Zhuotao ;
Zhong, Zhisheng ;
Jia, Jiaya .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3695-3706
[6]  
Czempiel Tobias, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P343, DOI 10.1007/978-3-030-59716-0_33
[7]   OperA: Attention-Regularized Transformers for Surgical Phase Recognition [J].
Czempiel, Tobias ;
Paschali, Magdalini ;
Ostler, Daniel ;
Kim, Seong Tae ;
Busam, Benjamin ;
Navab, Nassir .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV, 2021, 12904 :604-614
[8]   MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [J].
Dai, Rui ;
Das, Srijan ;
Kahatapitiya, Kumara ;
Ryoo, Michael S. ;
Bremond, Francois .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20009-20019
[9]   Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos [J].
Ding, Xinpeng ;
Li, Xiaomeng .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (11) :3309-3319
[10]   Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer [J].
Gao, Xiaojie ;
Jin, Yueming ;
Long, Yonghao ;
Dou, Qi ;
Heng, Pheng-Ann .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV, 2021, 12904 :593-603