Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

被引:18
作者
Tian, Yu [1 ,2 ,4 ]
Pang, Guansong [3 ]
Liu, Fengbei [1 ]
Liu, Yuyuan [1 ]
Wang, Chong [1 ]
Chen, Yuanhong [1 ]
Verjans, Johan [2 ]
Carneiro, Gustavo [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, Australia
[2] South Australian Hlth & Med Res Inst, Adelaide, Australia
[3] Singapore Management Univ, Singapore, Singapore
[4] Harvard Med Sch, Boston, MA USA
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III | 2022年 / 13433卷
基金
澳大利亚研究理事会;
关键词
Polyp detection; Colonoscopy; Weakly-supervised learning; Video anomaly detection; Vision transformer;
D O I
10.1007/978-3-031-16437-8_9
中图分类号
R445 [影像诊断学];
学科分类号
100207 ;
摘要
Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images, which i) ignore the importance of temporal information in consecutive video frames, and ii) lack knowledge about the polyps. Consequently, they often have high detection errors, especially on challenging polyp cases (e.g., small, flat, or partially visible polyps). In this work, we formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps. In particular, we propose a novel convolutional transformer-based multiple instance learning method designed to identify abnormal frames (i.e., frames with polyps) from anomalous videos (i.e., videos containing at least one frame with polyp). In our method, local and global temporal dependencies are seamlessly captured while we simultaneously optimise video and snippet-level anomaly scores. A contrastive snippet mining method is also proposed to enable an effective modelling of the challenging polyp cases. The resulting method achieves a detection accuracy that is substantially better than current state-of-the-art approaches on a new large-scale colonoscopy video dataset introduced in this work.
引用
收藏
页码:88 / 98
页数:11
相关论文
共 32 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]   HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy [J].
Borgli, Hanna ;
Thambawita, Vajira ;
Smedsrud, Pia H. ;
Hicks, Steven ;
Jha, Debesh ;
Eskeland, Sigrun L. ;
Randel, Kristin Ranheim ;
Pogorelov, Konstantin ;
Lux, Mathias ;
Nguyen, Duc Tien Dang ;
Johansen, Dag ;
Griwodz, Carsten ;
Stensland, Hakon K. ;
Garcia-Ceja, Enrique ;
Schmidt, Peter T. ;
Hammer, Hugo L. ;
Riegler, Michael A. ;
Halvorsen, Pal ;
de Lange, Thomas .
SCIENTIFIC DATA, 2020, 7 (01)
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]  
Chen Y., 2021, ARXIV
[5]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[6]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[7]   MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [J].
Feng, Jia-Chang ;
Hong, Fa-Ting ;
Zheng, Wei-Shi .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14004-14013
[8]   Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection [J].
Gong, Dong ;
Liu, Lingqiao ;
Le, Vuong ;
Saha, Budhaditya ;
Mansour, Moussa Reda ;
Venkatesh, Svetha ;
van den Hengel, Anton .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1705-1714
[9]   Progressively Normalized Self-Attention Network for Video Polyp Segmentation [J].
Ji, Ge-Peng ;
Chou, Yu-Cheng ;
Fan, Deng-Ping ;
Chen, Geng ;
Fu, Huazhu ;
Jha, Debesh ;
Shao, Ling .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 :142-152
[10]  
Kay W, 2017, Arxiv, DOI arXiv:1705.06950