E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception

被引:40
作者
Long, Yonghao [1 ]
Li, Zhaoshuo [2 ]
Yee, Chi Hang [3 ]
Ng, Chi Fai [3 ]
Taylor, Russell H. [2 ]
Unberath, Mathias [2 ]
Dou, Qi [1 ,4 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
[2] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[3] Chinese Univ Hong Kong, SH Ho Urol Ctr, Dept Surg, Shatin, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, T Stone Robot Inst, Shatin, Hong Kong, Peoples R China
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV | 2021年 / 12904卷
关键词
Dynamic surgical scene reconstruction; Transformer-based depth estimation; Stereo image perception;
D O I
10.1007/978-3-030-87202-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a lightweight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed.
引用
收藏
页码:415 / 425
页数:11
相关论文
共 28 条
[21]   Intra-Operative Visualizations: Perceptual Fidelity and Human Factors [J].
Stoyanov, Danail ;
Mylonas, George P. ;
Lerotic, Mirna ;
Chung, Adrian J. ;
Yang, Guang-Zhong .
JOURNAL OF DISPLAY TECHNOLOGY, 2008, 4 (04) :491-501
[22]  
Taylor RH, 2016, SPRINGER HANDBOOK OF ROBOTICS, P1657
[23]  
Vaswani A, 2017, ADV NEUR IN, V30
[24]   Image quality assessment: From error visibility to structural similarity [J].
Wang, Z ;
Bovik, AC ;
Sheikh, HR ;
Simoncelli, EP .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (04) :600-612
[25]  
Xingtong Liu, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12263), P3, DOI 10.1007/978-3-030-59716-0_1
[26]   Hierarchical Deep Stereo Matching on High-resolution Images [J].
Yang, Gengshan ;
Manela, Joshua ;
Happold, Michael ;
Ramanan, Deva .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5510-5519
[27]  
Ye M., 2017, ABS170508260 ARXIV
[28]   A flexible new technique for camera calibration [J].
Zhang, ZY .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (11) :1330-1334