Taming vision transformers for clinical laryngoscopy assessment

被引:0
作者
Zhang, Xinzhu [1 ]
Zhao, Jing [1 ]
Zong, Daoming [1 ]
Ren, Henglei [2 ]
Gao, Chunli [2 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, North Zhongshan Rd 3663, Shanghai 200062, Peoples R China
[2] Fudan Univ, Eye & ENT Hosp, Fenyang Rd 83, Shanghai 200000, Peoples R China
关键词
Laryngeal cancer; Deep learning; Transformer; Transfer learning; Medical image classification; CLASSIFICATION;
D O I
10.1016/j.jbi.2024.104766
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Laryngoscopy, essential for diagnosing laryngeal cancer (LCA), faces challenges due to high inter-observer variability and the reliance on endoscopist expertise. Distinguishing precancerous from early- stage cancerous lesions is particularly challenging, even for experienced practitioners, given their similar appearances. This study aims to enhance laryngoscopic image analysis to improve early screening/detection of cancer or precancerous conditions. Methods: We propose MedFormer, a laryngeal cancer classification method based on the Vision Transformer (ViT). To address data scarcity, MedFormer employs a customized transfer learning approach that leverages the representational power of pre-trained transformers. This method enables robust out-of-domain generalization by fine-tuning a minimal set of additional parameters. Results: MedFormer exhibits sensitivity-specificity values of 98%-89% for identifying precancerous lesions (leukoplakia) and 89%-97% for detecting cancer, surpassing CNN counterparts significantly. Additionally, when compared to the two selected ViT-based models, MedFormer also demonstrates superior performance. It also outperforms physician visual evaluations (PVE) in certain scenarios and matches PVE performance in all cases. Visualizations using class activation maps (CAM) and deformable patches demonstrate MedFormer's interpretability, aiding clinicians in understanding the model's predictions. Conclusion: We highlight the potential of visual transformers in clinical laryngoscopic assessments, presenting MedFormer as an effective method for the early detection of laryngeal cancer.
引用
收藏
页数:10
相关论文
共 48 条
  • [1] Deep Learning Applied to White Light and Narrow Band Imaging Videolaryngoscopy: Toward Real-Time Laryngeal Cancer Detection
    Azam, Muhammad Adeel
    Sampieri, Claudio
    Ioppi, Alessandro
    Africano, Stefano
    Vallin, Alberto
    Mocellin, Davide
    Fragale, Marco
    Guastini, Luca
    Moccia, Sara
    Piazza, Cesare
    Mattos, Leonardo S.
    Peretti, Giorgio
    [J]. LARYNGOSCOPE, 2022, 132 (09) : 1798 - 1806
  • [2] Laryngeal Tumor Detection and Classification in Endoscopic Video
    Barbalata, Corina
    Mattos, Leonardo S.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 20 (01) : 322 - 332
  • [3] USMicroMagSet: Using Deep Learning Analysis to Benchmark the Performance of Microrobots in Ultrasound Images
    Botros, Karim
    Alkhatib, Mohammad
    Folio, David
    Ferreira, Antoine
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (06) : 3254 - 3261
  • [4] MTDCNet: A 3D multi-threading dilated convolutional network for brain tumor automatic segmentation
    Chen, Wankun
    Zhou, Weifeng
    Zhu, Ling
    Cao, Yuan
    Gu, Haiming
    Yu, Bin
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 133
  • [5] DPT: Deformable Patch-based Transformer for Visual Recognition
    Chen, Zhiyang
    Zhu, Yousong
    Zhao, Chaoyang
    Hu, Guosheng
    Zeng, Wei
    Wang, Jinqiao
    Tang, Ming
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2899 - 2907
  • [6] Clinically applicable deep learning for diagnosis and referral in retinal disease
    De Fauw, Jeffrey
    Ledsam, Joseph R.
    Romera-Paredes, Bernardino
    Nikolov, Stanislav
    Tomasev, Nenad
    Blackwell, Sam
    Askham, Harry
    Glorot, Xavier
    O'Donoghue, Brendan
    Visentin, Daniel
    van den Driessche, George
    Lakshminarayanan, Balaji
    Meyer, Clemens
    Mackinder, Faith
    Bouton, Simon
    Ayoub, Kareem
    Chopra, Reena
    King, Dominic
    Karthikesalingam, Alan
    Hughes, Cian O.
    Raine, Rosalind
    Hughes, Julian
    Sim, Dawn A.
    Egan, Catherine
    Tufail, Adnan
    Montgomery, Hugh
    Hassabis, Demis
    Rees, Geraint
    Back, Trevor
    Khaw, Peng T.
    Suleyman, Mustafa
    Cornebise, Julien
    Keane, Pearse A.
    Ronneberger, Olaf
    [J]. NATURE MEDICINE, 2018, 24 (09) : 1342 - +
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [8] LETR: A LIGHTWEIGHT AND EFFICIENT TRANSFORMER FOR KEYWORD SPOTTING
    Ding, Kevin
    Zong, Martin
    Li, Jiakui
    Li, Baoxiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7987 - 7991
  • [9] Dosovitskiy A., 2020, ARXIV, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929, 10.48550/ARXIV.2010.11929]
  • [10] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]