Interpretable Computer Vision to Detect and Classify Structural Laryngeal Lesions in Digital Flexible Laryngoscopic Images

被引:16
作者
Bur, Andres M. [1 ,5 ]
Zhang, Tianxiao [2 ]
Chen, Xiangyu [2 ]
Kavookjian, Hannah [1 ]
Kraft, Shannon [1 ]
Karadaghy, Omar [1 ]
Farrokhian, Nathan [1 ]
Mussatto, Caroline [3 ]
Penn, Joseph [3 ]
Wang, Guanghui [4 ]
机构
[1] Univ Kansas, Dept Otolaryngol Head & Neck Surg, Med Ctr, Kansas City, KS USA
[2] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS USA
[3] Univ Kansas, Sch Med, Kansas City, KS USA
[4] Toronto Metropolitan Univ, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Kansas, Dept Otolaryngol Head & Neck Surg, Sch Med, 3901 Rainbow Blvd, Kansas City, KS 66160 USA
基金
美国国家卫生研究院;
关键词
artificial intelligence; detection; laryngeal cancer; laryngoscopy; neural networks; COMPETENCE;
D O I
10.1002/ohn.411
中图分类号
R76 [耳鼻咽喉科学];
学科分类号
100213 ;
摘要
ObjectiveTo localize structural laryngeal lesions within digital flexible laryngoscopic images and to classify them as benign or suspicious for malignancy using state-of-the-art computer vision detection models. Study DesignCross-sectional diagnostic study SettingTertiary care voice clinic MethodsDigital stroboscopic videos, demographic and clinical data were collected from patients evaluated for a structural laryngeal lesion. Laryngoscopic images were extracted from videos and manually labeled with bounding boxes encompassing the lesion. Four detection models were employed to simultaneously localize and classify structural laryngeal lesions in laryngoscopic images. Classification accuracy, intersection over union (IoU) and mean average precision (mAP) were evaluated as measures of classification, localization, and overall performance, respectively. ResultsIn total, 8,172 images from 147 patients were included in the laryngeal image dataset. Classification accuracy was 88.5 for individual laryngeal images and increased to 92.0 when all images belonging to the same sequence (video) were considered. Mean average precision across all four detection models was 50.1 using an IoU threshold of 0.5 to determine successful localization. ConclusionResults of this study showed that deep neural network-based detection models trained using a labeled dataset of digital laryngeal images have the potential to classify structural laryngeal lesions as benign or suspicious for malignancy and to localize them within an image. This approach provides valuable insight into which part of the image was used by the model to determine a diagnosis, allowing clinicians to independently evaluate models' predictions.
引用
收藏
页码:1564 / 1572
页数:9
相关论文
共 20 条
[1]   Time to Competency, Reliability of Flexible Transnasal Laryngoscopy by Training Level: A Pilot Study [J].
Brook, Christopher D. ;
Platt, Michael P. ;
Russell, Kimberly ;
Grillone, Gregory A. ;
Aliphas, Avner ;
Noordzij, J. Pieter .
OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2015, 152 (05) :843-850
[2]  
Carter Shan, 2019, Distill, DOI DOI 10.23915/DISTILL.00015
[3]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[4]   Research electronic data capture (REDCap)-A metadata-driven methodology and workflow process for providing translational research informatics support [J].
Harris, Paul A. ;
Taylor, Robert ;
Thielke, Robert ;
Payne, Jonathon ;
Gonzalez, Nathaniel ;
Conde, Jose G. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (02) :377-381
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[6]   Probabilistic Anchor Assignment with IoU Prediction for Object Detection [J].
Kim, Kang ;
Lee, Hee Seok .
COMPUTER VISION - ECCV 2020, PT XXV, 2020, 12370 :355-371
[7]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[8]   Learning Curve for Competency in Flexible Laryngoscopy [J].
Laeeq, Kulsoom ;
Pandian, Vinciya ;
Skinner, Margret ;
Masood, Hamid ;
Stewart, Charles M. ;
Weatherly, Robert ;
Cummings, Charles W. ;
Bhatti, Nasir I. .
LARYNGOSCOPE, 2010, 120 (10) :1950-1953
[9]   Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection [J].
Li, Xiang ;
Wang, Wenhai ;
Hu, Xiaolin ;
Li, Jun ;
Tang, Jinhui ;
Yang, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11627-11636
[10]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755