End-to-end deep learning classification of vocal pathology using stacked vowels

被引:2
|
作者
Liu, George S. [1 ,2 ]
Hodges, Jordan M. [3 ]
Yu, Jingzhi [4 ]
Sung, C. Kwang [1 ,2 ]
Erickson-DiRenzo, Elizabeth [1 ,2 ]
Doyle, Philip C. [1 ,2 ,5 ]
机构
[1] Stanford Univ, Dept Otolaryngol Head & Neck Surg, Stanford Sch Med, Stanford, CA 94305 USA
[2] Stanford Univ, Sch Med, Div Laryngol, Stanford, CA 94305 USA
[3] Stanford Univ, Sch Engn, Comp Sci Dept, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Biomed Data Sci, Biomed Informat, Sch Med, Stanford, CA 94305 USA
[5] Stanford Univ, Sch Med, Div Laryngol, Otolaryngol Head & Neck Surg, 801 Welch Rd, Stanford, CA 94035 USA
来源
LARYNGOSCOPE INVESTIGATIVE OTOLARYNGOLOGY | 2023年 / 8卷 / 05期
关键词
artificial intelligence; deep learning; voice classification; voice disorders; voice pathology; NEURAL-NETWORKS; VOICE QUALITY; FRAMEWORK; DATABASE;
D O I
10.1002/lio2.1144
中图分类号
R76 [耳鼻咽喉科学];
学科分类号
100213 ;
摘要
Objectives: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.Methods: Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1-dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.Results: For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class-specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).Conclusions: This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI-driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.
引用
收藏
页码:1312 / 1318
页数:7
相关论文
共 50 条
  • [1] Classification of ALS Point Clouds Using End-to-End Deep Learning
    Winiwarter, Lukas
    Mandiburger, Gottfried
    Schmohl, Stefan
    Pfeifer, Norbert
    PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2019, 87 (03): : 75 - 90
  • [2] Classification of ALS Point Clouds Using End-to-End Deep Learning
    Lukas Winiwarter
    Gottfried Mandlburger
    Stefan Schmohl
    Norbert Pfeifer
    PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 2019, 87 : 75 - 90
  • [3] End-to-end Multimodel Deep Learning for Malware Classification
    Snow, Elijah
    Alam, Mahbubul
    Glandon, Alexander
    Iftekharuddin, Khan
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] An End-to-End Deep Learning System for Hop Classification
    Castro, Pedro
    Moreira, Gladston
    Luz, Eduardo
    IEEE LATIN AMERICA TRANSACTIONS, 2022, 20 (03) : 430 - 442
  • [5] An End-to-End Deep Learning Method for Voltage Sag Classification
    Turovic, Radovan
    Dragan, Dinu
    Gojic, Gorana
    Petrovic, Veljko B.
    Gajic, Dusan B.
    Stanisavljevic, Aleksandar M.
    Katic, Vladimir A.
    ENERGIES, 2022, 15 (08)
  • [6] End-to-end deep learning with neuromorphic photonics
    Dabos, G.
    Mourgias-Alexandris, G.
    Totovic, A.
    Kirtas, M.
    Passalis, N.
    Tefas, A.
    Pleros, N.
    INTEGRATED OPTICS: DEVICES, MATERIALS, AND TECHNOLOGIES XXV, 2021, 11689
  • [7] An end-to-end deep learning system for requirements classification using recurrent neural networks
    AlDhafer, Osamah
    Ahmad, Irfan
    Mahmood, Sajjad
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 147
  • [8] FinSNet: End-to-End Separation of Overlapped Fingerprints Using Deep Learning
    Yoo, Dongheon
    Cho, Jaebum
    Lee, Juhyun
    Chae, Minseok
    Lee, Byounghyo
    Lee, Byoungho
    IEEE ACCESS, 2020, 8 : 209020 - 209029
  • [9] DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI
    Riaz, Atif
    Asad, Muhammad
    Alonso, Eduardo
    Slabaugh, Greg
    JOURNAL OF NEUROSCIENCE METHODS, 2020, 335
  • [10] An end-to-end approach to autonomous vehicle control using deep learning
    Magera Novello, Gustavo Antonio
    Yamamoto, Henrique Yda
    Lustosa Cabral, Eduardo Lobo
    REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2021, 13 (03): : 32 - 41