Pathological Voice Detection and Classification Based on Multimodal Transmission Network

被引:8
作者
Geng, Lei [1 ,2 ]
Liang, Yan [2 ,3 ]
Shan, Hongfeng [2 ,3 ]
Xiao, Zhitao [1 ,4 ,5 ,6 ,8 ]
Wang, Wei [4 ,5 ,6 ,8 ]
Wei, Mei [1 ,2 ,3 ,5 ,6 ,7 ,8 ]
机构
[1] Tiangong Univ, Sch Life Sci, Tianjin, Peoples R China
[2] Tianjin Key Lab Optoelect Detect Technol & Syst, Tianjin, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tianjin First Cent Hosp, Dept Otorhinolaryngol Head & Neck Surg, Tianjin, Peoples R China
[5] Inst Otolaryngol Tianjin, Tianjin, Peoples R China
[6] Key Lab Auditory Speech & Balance Med, Tianjin, Peoples R China
[7] Key Clin Discipline Tianjin Otolaryngol, Tianjin, Peoples R China
[8] Otolaryngol Clin Qual Control Ctr, Tianjin, Peoples R China
关键词
Pathological voice; Deep neural network; Automatic detection and classification; Multimodal; Saarbrucken voice database; HEALTH-CARE;
D O I
10.1016/j.jvoice.2022.11.018
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Objectives. Describing pronunciation features from multiple perspectives can help doctors accurately diagnose the pathological type of a patient's voice. According to the two modal information of sound signal and electroglottography (EGG) signal, this paper proposes a pathological voice detection and classification algorithm based on multimodal transmission network. Methods. Firstly, we used the short-time Fourier transform (STFT) to map the features of the two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed multimodal transmission network extracted features from Mel spectogram and applied Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate multimodal information, and the full connection layer diagnoses and classifies voice pathology according to the fused features. Results. The experiment was based on 1179 subjects in Saarbru<euro>cken voice database (SVD), and the average accuracy, recall, specificity and F1 score of pathological voice classification reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms, the classification accuracy is significantly improved. Conclusions. The proposed model can integrate multiple modal information to obtain more comprehensive and stable voice features and improve the accuracy of pathological voice classification. Future research will further explore in reducing the time-consuming and complexity of the model.
引用
收藏
页码:591 / 601
页数:11
相关论文
共 26 条
[1]   Classification of Parkinson Disease Based on Patient's Voice Signal Using Machine Learning [J].
Ahmed, Imran ;
Aljahdali, Sultan ;
Khan, Muhammad Shakeel ;
Kaddoura, Sanaa .
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02) :705-722
[2]   Classification of speech dysfluencies with MFCC and LPCC features [J].
Ai, Ooi Chia ;
Hariharan, M. ;
Yaacob, Sazali ;
Chee, Lim Sin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) :2157-2165
[3]   An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification [J].
Al-nasheri, Ahmed ;
Muhammad, Ghulam ;
Alsulaiman, Mansour ;
Ali, Zulfiqar ;
Mesallam, Tamer A. ;
Farahat, Mohamed ;
Malki, Khalid H. ;
Bencherif, Mohamed A. .
JOURNAL OF VOICE, 2017, 31 (01) :113.e9-113.e18
[4]   Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions [J].
Al-nasheri, Ahmed ;
Muhammad, Ghulam ;
Alsulaiman, Mansour ;
Ali, Zulfiqar .
JOURNAL OF VOICE, 2017, 31 (01) :3-15
[5]   Automatic Voice Pathology Monitoring Using Parallel Deep Models for Smart Healthcare [J].
Alhussein, Musaed ;
Muhammad, Ghulam .
IEEE ACCESS, 2019, 7 :46474-46479
[6]   Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework [J].
Alhussein, Musaed ;
Muhammad, Ghulam .
IEEE ACCESS, 2018, 6 :41034-41041
[7]   An incremental method combining density clustering and support vector machines for voice pathology detection [J].
Amami, Rimah ;
Smiti, Abir .
COMPUTERS & ELECTRICAL ENGINEERING, 2017, 57 :257-265
[8]  
Chuang ZY, 2018, IEEE INT CONF BIG DA, P5238, DOI 10.1109/BigData.2018.8622317
[9]   Recurrence Quantification Analysis of Glottal Signal as non Linear Tool for Pathological Voice Assessment and Classification [J].
Dahmani, Mohamed ;
Guerti, Mhania .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (06) :857-866
[10]  
Dahmani M, 2017, INT CONF SYST CONTRO, P426, DOI 10.1109/ICoSC.2017.7958686