Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos

被引:0
|
作者
Liao, Chao [1 ,2 ]
Wang, Chengliang [2 ]
Wang, Peng [2 ]
Wu, Hao [2 ]
Wang, Hongqian [2 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
[2] Army Med Univ, Southwest Hosp, Chongqing, Peoples R China
关键词
Wireless capsule endoscopy (WCE) images; Monocular depth estimation; Ego-motion estimation; Non-rigid scenes; Transformer; CANCER;
D O I
10.1016/j.bspc.2024.105978
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background and objective: Gastrointestinal (GI) cancers represent the most widespread type of cancer worldwide. Wireless capsule endoscopy (WCE), an innovative, capsule -sized endoscope, has the potential to revolutionize both the diagnosis and treatment of GI cancers as well as other GI diseases by offering patients a less invasive and more comfortable option. Nonetheless, WCE videos frequently display non -rigid transformations and brightness fluctuations, rendering prior simultaneous localization and mapping (SLAM) approaches unfeasible. The depth can assist in recognizing and monitoring potential obstructions or anomalies when localization is required. Methods: In this paper, we present a self -supervised model, SfMLearner-WCE, specifically designed for estimating depth and ego motion in WCE videos. Our approach incorporates a pose estimation network and a Transformer network with a global self -attention mechanism. To ensure high -quality depth and pose estimation, we propose learnable binary per -pixel masks to eliminate misaligned image regions arising from non -rigid transformations or significant changes in lighting. Additionally, we introduce multi -interval frame sampling to enhance training data diversity, coupled with long-term pose consistency regularization. Results: We present a comprehensive evaluation of the performance of SfMLearner-WCE in comparison with five state-of-the-art self -supervised SLAM methods. Our proposed approach is rigorously assessed on three WCE datasets. The experimental results demonstrate our approach achieves high -quality depth estimation and high -precision ego -motion estimation for non -rigid scenes in WCE videos, outperforming other self -supervised SLAM methods. In the quantitative evaluation of depth estimation using the ColonDepth dataset, an absolute relative error of 0.232 was observed. Additionally, during the quantitative assessment of ego -motion estimation on the ColonSim dataset, a translation drift percentage of 43.176% was achieved at a frame rate of 2 frames per second. Conclusions: The experimental analysis conducted in this study offers evidence of the effectiveness and robustness of our proposed method, SfMLearner-WCE, in non -rigid scenes of WCE videos. SfMLearner-WCE assists in enhancing diagnostic efficiency, enabling physicians to navigate and analyze WCE videos more effectively, benefiting patient outcomes. Our code will be released at https://github.com/fisherliaoc/SfMLearner-WCE.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion
    Tishchenko, Ivan
    Lombardi, Sandro
    Oswald, Martin R.
    Pollefeys, Marc
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 150 - 159
  • [2] Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue
    Shao, Shuwei
    Pei, Zhongcai
    Chen, Weihai
    Zhu, Wentao
    Wu, Xingming
    Sun, Dianmin
    Zhang, Baochang
    MEDICAL IMAGE ANALYSIS, 2022, 77
  • [3] Self-Supervised Attention Learning for Depth and Ego-motion Estimation
    Sadek, Assent
    Chidlovskii, Boris
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10054 - 10060
  • [4] Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion
    Chang, Qi
    Higgins, William E.
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [5] Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
    Fang, Jiaojiao
    Liu, Guizhong
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 465 - 477
  • [6] Self-supervised Monocular Pose and Depth Estimation for Wireless Capsule Endoscopy with Transformers
    Nazifi, Nahid
    Araujo, Helder
    Erabati, Gopi Krishna
    Tahri, Omar
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [7] WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters
    Lou, Ange
    Noble, Jack
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [8] Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
    Vasiljevic, Igor
    Guizilini, Vitor
    Ambrus, Rares
    Pillai, Sudeep
    Burgard, Wolfram
    Shakhnarovich, Greg
    Gaidon, Adrien
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1 - 11
  • [9] Joint self-supervised learning of interest point, descriptor, depth, and ego-motion from monocular video
    Wang, Zhongyi
    Shen, Mengjiao
    Chen, Qijun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77529 - 77547
  • [10] Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy
    Liu, Xingtong
    Sinha, Ayushi
    Unberath, Mathias
    Ishii, Masaru
    Hager, Gregory D.
    Taylor, Russell H.
    Reiter, Austin
    OR 2.0 CONTEXT-AWARE OPERATING THEATERS, COMPUTER ASSISTED ROBOTIC ENDOSCOPY, CLINICAL IMAGE-BASED PROCEDURES, AND SKIN IMAGE ANALYSIS, OR 2.0 2018, 2018, 11041 : 128 - 138