Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos

被引:0
|
作者
Liao, Chao [1 ,2 ]
Wang, Chengliang [2 ]
Wang, Peng [2 ]
Wu, Hao [2 ]
Wang, Hongqian [2 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
[2] Army Med Univ, Southwest Hosp, Chongqing, Peoples R China
关键词
Wireless capsule endoscopy (WCE) images; Monocular depth estimation; Ego-motion estimation; Non-rigid scenes; Transformer; CANCER;
D O I
10.1016/j.bspc.2024.105978
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background and objective: Gastrointestinal (GI) cancers represent the most widespread type of cancer worldwide. Wireless capsule endoscopy (WCE), an innovative, capsule -sized endoscope, has the potential to revolutionize both the diagnosis and treatment of GI cancers as well as other GI diseases by offering patients a less invasive and more comfortable option. Nonetheless, WCE videos frequently display non -rigid transformations and brightness fluctuations, rendering prior simultaneous localization and mapping (SLAM) approaches unfeasible. The depth can assist in recognizing and monitoring potential obstructions or anomalies when localization is required. Methods: In this paper, we present a self -supervised model, SfMLearner-WCE, specifically designed for estimating depth and ego motion in WCE videos. Our approach incorporates a pose estimation network and a Transformer network with a global self -attention mechanism. To ensure high -quality depth and pose estimation, we propose learnable binary per -pixel masks to eliminate misaligned image regions arising from non -rigid transformations or significant changes in lighting. Additionally, we introduce multi -interval frame sampling to enhance training data diversity, coupled with long-term pose consistency regularization. Results: We present a comprehensive evaluation of the performance of SfMLearner-WCE in comparison with five state-of-the-art self -supervised SLAM methods. Our proposed approach is rigorously assessed on three WCE datasets. The experimental results demonstrate our approach achieves high -quality depth estimation and high -precision ego -motion estimation for non -rigid scenes in WCE videos, outperforming other self -supervised SLAM methods. In the quantitative evaluation of depth estimation using the ColonDepth dataset, an absolute relative error of 0.232 was observed. Additionally, during the quantitative assessment of ego -motion estimation on the ColonSim dataset, a translation drift percentage of 43.176% was achieved at a frame rate of 2 frames per second. Conclusions: The experimental analysis conducted in this study offers evidence of the effectiveness and robustness of our proposed method, SfMLearner-WCE, in non -rigid scenes of WCE videos. SfMLearner-WCE assists in enhancing diagnostic efficiency, enabling physicians to navigate and analyze WCE videos more effectively, benefiting patient outcomes. Our code will be released at https://github.com/fisherliaoc/SfMLearner-WCE.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos
    Jiang, Hualie
    Ding, Laiyan
    Sun, Zhenglong
    Huang, Rui
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10061 - 10067
  • [32] EDS-Depth: Enhancing Self-Supervised Monocular Depth Estimation in Dynamic Scenes
    Yu, Shangshu
    Wu, Meiqing
    Lam, Siew-Kei
    Wang, Changshuo
    Wang, Ruiping
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
  • [33] Self-supervised monocular image depth learning and confidence estimation
    Chen, Long
    Tang, Wen
    Wan, Tao Ruan
    John, Nigel W.
    NEUROCOMPUTING, 2020, 381 : 272 - 281
  • [34] Rectified self-supervised monocular depth estimation loss for nighttime and dynamic scenes
    Qin, Xiaofei
    Wang, Lin
    Zhu, Yongchao
    Mao, Fan
    Zhang, Xuedian
    He, Changxiang
    Dong, Qiulei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
  • [35] Self-Supervised Multi-Frame Monocular Depth Estimation for Dynamic Scenes
    Wu, Guanghui
    Liu, Hao
    Wang, Longguang
    Li, Kunhong
    Guo, Yulan
    Chen, Zengping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4989 - 5001
  • [36] Self-supervised monocular depth estimation in dynamic scenes with moving instance loss
    Yue, Min
    Fu, Guangyuan
    Wu, Ming
    Zhang, Xin
    Gu, Hongyang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 112
  • [37] Time-based self-supervised learning for Wireless Capsule Endoscopy
    Pascual, Guillem
    Laiz, Pablo
    Garcia, Albert
    Wenzek, Hagen
    Vitria, Jordi
    Segui, Santi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [38] Self-Supervised Learning of Depth and Ego-Motion From Videos by Alternative Training and Geometric Constraints from 3-D to 2-D
    Fang, Jiaojiao
    Liu, Guizhong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 223 - 233
  • [39] EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity
    Jiang, Zijie
    Okutomi, Masatoshi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 69 - 78
  • [40] Self-supervised monocular depth estimation on water scenes via specular reflection prior
    Lu, Zhengyang
    Chen, Ying
    DIGITAL SIGNAL PROCESSING, 2024, 149