End-to-end deep learning classification of vocal pathology using stacked vowels

被引：2

作者：

Liu, George S. ^{[1
,2
]}

Hodges, Jordan M. ^{[3
]}

Yu, Jingzhi ^{[4
]}

Sung, C. Kwang ^{[1
,2
]}

Erickson-DiRenzo, Elizabeth ^{[1
,2
]}

Doyle, Philip C. ^{[1
,2
,5
]}

机构：

[1] Stanford Univ, Dept Otolaryngol Head & Neck Surg, Stanford Sch Med, Stanford, CA 94305 USA

[2] Stanford Univ, Sch Med, Div Laryngol, Stanford, CA 94305 USA

[3] Stanford Univ, Sch Engn, Comp Sci Dept, Stanford, CA 94305 USA

[4] Stanford Univ, Dept Biomed Data Sci, Biomed Informat, Sch Med, Stanford, CA 94305 USA

[5] Stanford Univ, Sch Med, Div Laryngol, Otolaryngol Head & Neck Surg, 801 Welch Rd, Stanford, CA 94035 USA

来源：

LARYNGOSCOPE INVESTIGATIVE OTOLARYNGOLOGY | 2023年 / 8卷 / 05期

关键词：

artificial intelligence; deep learning; voice classification; voice disorders; voice pathology; NEURAL-NETWORKS; VOICE QUALITY; FRAMEWORK; DATABASE;

D O I：

10.1002/lio2.1144

中图分类号：

R76 [耳鼻咽喉科学];

学科分类号：

100213 ;

摘要：

Objectives: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.Methods: Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1-dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.Results: For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class-specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).Conclusions: This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI-driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.

引用

页码：1312 / 1318

页数：7

共 50 条

[1] Classification of ALS Point Clouds Using End-to-End Deep Learning
Winiwarter, Lukas
Mandiburger, Gottfried
Schmohl, Stefan
Pfeifer, Norbert
PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2019, 87 (03): : 75 - 90
[2] Classification of ALS Point Clouds Using End-to-End Deep Learning
Lukas Winiwarter
Gottfried Mandlburger
Stefan Schmohl
Norbert Pfeifer
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 2019, 87 : 75 - 90
[3] End-to-end Multimodel Deep Learning for Malware Classification
Snow, Elijah
Alam, Mahbubul
Glandon, Alexander
Iftekharuddin, Khan
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[4] An End-to-End Deep Learning System for Hop Classification
Castro, Pedro
Moreira, Gladston
Luz, Eduardo
IEEE LATIN AMERICA TRANSACTIONS, 2022, 20 (03) : 430 - 442
[5] An End-to-End Deep Learning Method for Voltage Sag Classification
Turovic, Radovan
Dragan, Dinu
Gojic, Gorana
Petrovic, Veljko B.
Gajic, Dusan B.
Stanisavljevic, Aleksandar M.
Katic, Vladimir A.
ENERGIES, 2022, 15 (08)
[6] End-to-end deep learning with neuromorphic photonics
Dabos, G.
Mourgias-Alexandris, G.
Totovic, A.
Kirtas, M.
Passalis, N.
Tefas, A.
Pleros, N.
INTEGRATED OPTICS: DEVICES, MATERIALS, AND TECHNOLOGIES XXV, 2021, 11689
[7] An end-to-end deep learning system for requirements classification using recurrent neural networks
AlDhafer, Osamah
Ahmad, Irfan
Mahmood, Sajjad
INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 147
[8] FinSNet: End-to-End Separation of Overlapped Fingerprints Using Deep Learning
Yoo, Dongheon
Cho, Jaebum
Lee, Juhyun
Chae, Minseok
Lee, Byounghyo
Lee, Byoungho
IEEE ACCESS, 2020, 8 : 209020 - 209029
[9] DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI
Riaz, Atif
Asad, Muhammad
Alonso, Eduardo
Slabaugh, Greg
JOURNAL OF NEUROSCIENCE METHODS, 2020, 335
[10] An end-to-end approach to autonomous vehicle control using deep learning
Magera Novello, Gustavo Antonio
Yamamoto, Henrique Yda
Lustosa Cabral, Eduardo Lobo
REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2021, 13 (03): : 32 - 41

← 1 2 3 4 5 →