Two step convolutional neural network for automatic glottis localization and segmentation in stroboscopic videos

被引:2
作者
Belagali, Varun [1 ]
Rao, Achuth M., V [2 ]
Gopikishore, Pebbili [3 ]
Krishnamurthy, Rahul [4 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] RV Coll Engn, Comp Sci & Engn, Bangalore 560059, Karnataka, India
[2] Indian Inst Sci, Elect Engn, Bangalore 560012, Karnataka, India
[3] All India Inst Speech & Hearing, Mysuru 570006, India
[4] Manipal Acad Higher Educ, Kasturba Med Coll, Dept Audiol & Speech Language Pathol, Manipal, Karnataka, India
关键词
Precise analysis of the vocal fold vibratory pattern in a stroboscopic video plays a key role in the evaluation of voice disorders. Automatic glottis segmentation is one of the preliminary steps in such analysis. In this work; it is divided into two subproblems namely; glottis localization and glottis segmentation. A two step convolutional neural network (CNN) approach is proposed for the automatic glottis segmentation. Data augmentation is carried out using two techniques : 1) Blind rotation (WB); 2) Rotation with respect to glottis orientation (WO). The dataset used in this study contains stroboscopic videos of 18 subjects with Sulcus vocalis; in which the glottis region is annotated by three speech language pathologists (SLPs). The proposed two step CNN approach achieves an average localization accuracy of 90.08% and a mean dice score of 0.65. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement;
D O I
10.1364/BOE.396252
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Precise analysis of the vocal fold vibratory pattern in a stroboscopic video plays a key role in the evaluation of voice disorders. Automatic glottis segmentation is one of the preliminary steps in such analysis. In this work, it is divided into two subproblems namely, glottis localization and glottis segmentation. A two step convolutional neural network (CNN) approach is proposed for the automatic glottis segmentation. Data augmentation is carried out using two techniques : 1) Blind rotation (WB), 2) Rotation with respect to glottis orientation (WO). The dataset used in this study contains stroboscopic videos of 18 subjects with Sulcus vocalis, in which the glottis region is annotated by three speech language pathologists (SLPs). The proposed two step CNN approach achieves an average localization accuracy of 90.08% and a mean dice score of 0.65. (C) 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
引用
收藏
页码:4695 / 4713
页数:19
相关论文
共 32 条
  • [1] Abdullah Dahlan, 2018, Journal of Physics: Conference Series, V1114, DOI 10.1088/1742-6596/1114/1/012066
  • [2] Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola
    Ahn, SJ
    Rauh, W
    Warnecke, HJ
    [J]. PATTERN RECOGNITION, 2001, 34 (12) : 2283 - 2303
  • [3] [Anonymous], 2015, P IEEE C COMP VIS PA
  • [4] [Anonymous], 2016, ARXIV160502688
  • [5] Deep learning for cell image segmentation and ranking
    Araujo, Flavio H. D.
    Silva, Romuere R. V.
    Ushizima, Daniela M.
    Rezende, Mariana T.
    Carneiro, Claudia M.
    Campos Bianchi, Andrea G.
    Medeiros, Fatima N. S.
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2019, 72 : 13 - 21
  • [6] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [7] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [8] Cerrolaza J. J., 2011, MAVEBA, P35
  • [9] Chollet Francois, 2015, Keras
  • [10] Generalized overlap measures for evaluation and validation in medical image analysis
    Crum, William R.
    Camara, Oscar
    Hill, Derek L. G.
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2006, 25 (11) : 1451 - 1461