Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos

被引:0
|
作者
Liao, Chao [1 ,2 ]
Wang, Chengliang [2 ]
Wang, Peng [2 ]
Wu, Hao [2 ]
Wang, Hongqian [2 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
[2] Army Med Univ, Southwest Hosp, Chongqing, Peoples R China
关键词
Wireless capsule endoscopy (WCE) images; Monocular depth estimation; Ego-motion estimation; Non-rigid scenes; Transformer; CANCER;
D O I
10.1016/j.bspc.2024.105978
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background and objective: Gastrointestinal (GI) cancers represent the most widespread type of cancer worldwide. Wireless capsule endoscopy (WCE), an innovative, capsule -sized endoscope, has the potential to revolutionize both the diagnosis and treatment of GI cancers as well as other GI diseases by offering patients a less invasive and more comfortable option. Nonetheless, WCE videos frequently display non -rigid transformations and brightness fluctuations, rendering prior simultaneous localization and mapping (SLAM) approaches unfeasible. The depth can assist in recognizing and monitoring potential obstructions or anomalies when localization is required. Methods: In this paper, we present a self -supervised model, SfMLearner-WCE, specifically designed for estimating depth and ego motion in WCE videos. Our approach incorporates a pose estimation network and a Transformer network with a global self -attention mechanism. To ensure high -quality depth and pose estimation, we propose learnable binary per -pixel masks to eliminate misaligned image regions arising from non -rigid transformations or significant changes in lighting. Additionally, we introduce multi -interval frame sampling to enhance training data diversity, coupled with long-term pose consistency regularization. Results: We present a comprehensive evaluation of the performance of SfMLearner-WCE in comparison with five state-of-the-art self -supervised SLAM methods. Our proposed approach is rigorously assessed on three WCE datasets. The experimental results demonstrate our approach achieves high -quality depth estimation and high -precision ego -motion estimation for non -rigid scenes in WCE videos, outperforming other self -supervised SLAM methods. In the quantitative evaluation of depth estimation using the ColonDepth dataset, an absolute relative error of 0.232 was observed. Additionally, during the quantitative assessment of ego -motion estimation on the ColonSim dataset, a translation drift percentage of 43.176% was achieved at a frame rate of 2 frames per second. Conclusions: The experimental analysis conducted in this study offers evidence of the effectiveness and robustness of our proposed method, SfMLearner-WCE, in non -rigid scenes of WCE videos. SfMLearner-WCE assists in enhancing diagnostic efficiency, enabling physicians to navigate and analyze WCE videos more effectively, benefiting patient outcomes. Our code will be released at https://github.com/fisherliaoc/SfMLearner-WCE.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Self-Supervised Human Depth Estimation from Monocular Videos
    Tan, Feitong
    Zhu, Hao
    Cui, Zhaopeng
    Zhu, Siyu
    Pollefeys, Marc
    Tan, Ping
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 647 - 656
  • [22] Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue
    Seokju Lee
    Francois Rameau
    Sunghoon Im
    In So Kweon
    International Journal of Computer Vision, 2022, 130 : 2265 - 2285
  • [23] Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue
    Lee, Seokju
    Rameau, Francois
    Im, Sunghoon
    Kweon, In So
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2265 - 2285
  • [24] Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation
    Zhao, Hailiang
    Kong, Yongyi
    Zhang, Chonghao
    Zhang, Haoji
    Zhao, Jiansen
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (06)
  • [25] Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth
    Jiang, Zijie
    Taira, Hajime
    Miyashita, Naoyuki
    Okutomi, Masatoshi
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 7605 - 7611
  • [26] Self-supervised monocular depth estimation from oblique UAV videos
    Madhuanand, Logambal
    Nex, Francesco
    Yang, Michael Ying
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 176 : 1 - 14
  • [27] MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy
    He, Qi
    Feng, Guang
    Bano, Sophia
    Stoyanov, Danail
    Zuo, Siyang
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6078 - 6091
  • [28] Self-Supervised 3D Reconstruction and Ego-Motion Estimation via On-Board Monocular Video
    Jia, Shaocheng
    Pei, Xin
    Jing, Xiao
    Yao, Danya
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 7557 - 7569
  • [29] Maximizing Self-Supervision From Thermal Image for Effective Self-Supervised Learning of Depth and Ego-Motion
    Shin, Ukcheol
    Lee, Kyunghyun
    Lee, Byeong-Uk
    Kweon, In So
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 7771 - 7778
  • [30] Self-Supervised Learning of Depth and Ego-Motion for 3D Perception in Human Computer Interaction
    Qiao, Shanbao
    Xiong, Neal N.
    Gao, Yongbin
    Fang, Zhijun
    Yu, Wenjun
    Zhang, Juan
    Jiang, Xiaoyan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)