The Effect of Noise on Deep Learning for Classification of Pathological Voice

被引:4
作者
Hasebe, Koki [1 ]
Kojima, Tsuyoshi [1 ,2 ]
Fujimura, Shintaro [1 ]
Tamura, Keiichi [1 ]
Kawai, Yoshitaka [1 ]
Kishimoto, Yo [1 ]
Omori, Koichi [1 ]
机构
[1] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, Kyoto, Japan
[2] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, 54 Shogoin Kawahara Cho,Sakyo Ku, Kyoto 6068507, Japan
基金
日本学术振兴会;
关键词
1D-CNN; GRBAS scale; machine learning; noise resilience; voice disorders;
D O I
10.1002/lary.31303
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
ObjectiveThis study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders.MethodsA dataset of 1406 voice samples was collected from retrospective data, and a 5-layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score.ResultsThe model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components.ConclusionThe model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise-tolerant techniques, such as data augmentation, to improve the model's noise resilience in real-world settings.Level of EvidenceThis study evaluates a machine learning model using a single dataset without comparative controls. Given its non-comparative design and specific focus, it aligns with Level 4 evidence (Case-series) under the 2011 OCEBM guidelines Laryngoscope, 2024
引用
收藏
页码:3537 / 3541
页数:5
相关论文
共 12 条
  • [1] Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review
    Abayomi-Alli, Olusola O.
    Damasevicius, Robertas
    Qazi, Atika
    Adedoyin-Olowe, Mariam
    Misra, Sanjay
    [J]. ELECTRONICS, 2022, 11 (22)
  • [2] Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network
    Fujimura, Shintaro
    Kojima, Tsuyoshi
    Okanoue, Yusuke
    Shoji, Kazuhiko
    Inoue, Masato
    Omori, Koichi
    Hori, Ryusuke
    [J]. JOURNAL OF VOICE, 2022, 36 (01) : 15 - 20
  • [3] Hirano M., 1981, Disorders of human communication 5. Clinical examination of voice, P81
  • [4] An Innovative Voice Analyzer "VA" Smart Phone Program for Quantitative Analysis of Voice Quality
    Kojima, Tsuyoshi
    Fujimura, Shintaro
    Hori, Ryusuke
    Okanoue, Yusuke
    Shoji, Kazuhiko
    Inoue, Masato
    [J]. JOURNAL OF VOICE, 2019, 33 (05) : 642 - 648
  • [5] LISTENER EXPERIENCE AND PERCEPTION OF VOICE QUALITY
    KREIMAN, J
    GERRATT, BR
    PRECODA, K
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1990, 33 (01): : 103 - 115
  • [6] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [7] Li JC, 2017, INT CONF ACOUST SPEE, P126, DOI 10.1109/ICASSP.2017.7952131
  • [8] Mizuta M., 2011, PRACT OTORHINOLARYNG, V104, P297, DOI [10.5631/jibirin.104.297, DOI 10.5631/JIBIRIN.104.297]
  • [9] Data Augmentation for Training of Noise Robust Acoustic Models
    Prisyach, Tatiana
    Mendelev, Valentin
    Ubskiy, Dmitry
    [J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2016, 2017, 661 : 17 - 25
  • [10] Sáenz-Lechón N, 2006, 2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, P3667