The Effect of Noise on Deep Learning for Classification of Pathological Voice

被引：4

作者：

Hasebe, Koki ^{[1
]}

Kojima, Tsuyoshi ^{[1
,2
]}

Fujimura, Shintaro ^{[1
]}

Tamura, Keiichi ^{[1
]}

Kawai, Yoshitaka ^{[1
]}

Kishimoto, Yo ^{[1
]}

Omori, Koichi ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, Kyoto, Japan

[2] Kyoto Univ, Grad Sch Med, Dept Otolaryngol Head & Neck Surg, 54 Shogoin Kawahara Cho,Sakyo Ku, Kyoto 6068507, Japan

来源：

LARYNGOSCOPE | 2024年 / 134卷 / 08期

基金：

日本学术振兴会;

关键词：

1D-CNN; GRBAS scale; machine learning; noise resilience; voice disorders;

D O I：

10.1002/lary.31303

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

ObjectiveThis study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders.MethodsA dataset of 1406 voice samples was collected from retrospective data, and a 5-layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score.ResultsThe model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components.ConclusionThe model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise-tolerant techniques, such as data augmentation, to improve the model's noise resilience in real-world settings.Level of EvidenceThis study evaluates a machine learning model using a single dataset without comparative controls. Given its non-comparative design and specific focus, it aligns with Level 4 evidence (Case-series) under the 2011 OCEBM guidelines Laryngoscope, 2024

引用

页码：3537 / 3541

页数：5

共 12 条

[1] Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review
Abayomi-Alli, Olusola O.
Damasevicius, Robertas
Qazi, Atika
Adedoyin-Olowe, Mariam
Misra, Sanjay
[J]. ELECTRONICS, 2022, 11 (22)
[2] Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network
Fujimura, Shintaro
Kojima, Tsuyoshi
Okanoue, Yusuke
Shoji, Kazuhiko
Inoue, Masato
Omori, Koichi
Hori, Ryusuke
[J]. JOURNAL OF VOICE, 2022, 36 (01) : 15 - 20
[3] Hirano M., 1981, Disorders of human communication 5. Clinical examination of voice, P81
[4] An Innovative Voice Analyzer "VA" Smart Phone Program for Quantitative Analysis of Voice Quality
Kojima, Tsuyoshi
Fujimura, Shintaro
Hori, Ryusuke
Okanoue, Yusuke
Shoji, Kazuhiko
Inoue, Masato
[J]. JOURNAL OF VOICE, 2019, 33 (05) : 642 - 648
[5] LISTENER EXPERIENCE AND PERCEPTION OF VOICE QUALITY
KREIMAN, J
GERRATT, BR
PRECODA, K
[J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1990, 33 (01): : 103 - 115
[6] An Overview of Noise-Robust Automatic Speech Recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
[7] Li JC, 2017, INT CONF ACOUST SPEE, P126, DOI 10.1109/ICASSP.2017.7952131
[8] Mizuta M., 2011, PRACT OTORHINOLARYNG, V104, P297, DOI [10.5631/jibirin.104.297, DOI 10.5631/JIBIRIN.104.297]
[9] Data Augmentation for Training of Noise Robust Acoustic Models
Prisyach, Tatiana
Mendelev, Valentin
Ubskiy, Dmitry
[J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2016, 2017, 661 : 17 - 25
[10] Sáenz-Lechón N, 2006, 2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, P3667

← 1 2 →