Robust HI and dysarthric speaker recognition - perceptual features and models

被引：0

作者：

Revathi, A. ^{[1
]}

Nagakrishnan, R. ^{[1
]}

Sasikaladevi, N. ^{[2
]}

机构：

[1] SASTRA Deemed Univ, Dept ECE SEEE, Thanjavur, India

[2] SASTRA Deemed Univ, Dept CSE SoC, Thanjavur, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2022年 / 81卷 / 06期

关键词：

HI speaker recognition; Dysarthric speaker identification; Vector quantization; Perceptual features; SPATIAL SPEECH RECOGNITION; CONSONANT RECOGNITION; HEARING; NOISE; IDENTIFICATION; LISTENERS; ALGORITHM;

D O I：

10.1007/s11042-022-12184-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper explores the necessity of having hearing impaired (HI) and dysarthric speakers be part of the person authentication system and it is considered to be imperative. Automated system on identifying speakers is evaluated by having the perceptual features with critical band analysis done in various non-linear frequency scales and vector quantization (VQ) & Fuzzy C means (FCM) based iterative clustering templates and multi-variant hidden Markov (MHMM) models as representative of HI or dysarthric speakers. For developing a training system, perceptual features are extracted from the speeches of HI or dysarthric speakers after the initial pre-processing techniques namely voice activity detection, pre-emphasis, frame blocking, and windowing contemplated on the speech utterances, and VQ & FCM clustering models and MHMM models are created for each speaker and the study is done on varying cluster and mixture size. The testing phase emphasizes the extraction of features from the test utterances, application of features to the templates, and classification is done based on minimum distance criterion for clustering technique and maximum log-likelihood criterion for MHMM technique. This algorithm gives the overall accuracy of 100% when the decision level fusion classification is done for the perceptual features with critical band analysis done in MEL, BARK, and ERB scales for all the clusters with variations in cluster size for both hearing impaired and dysarthric speaker recognition. Decision level fusion classification using FCM and MHMM technique provides low overall accuracy as compared to the VQ technique.

引用

页码：8215 / 8233

页数：19

共 34 条

[1] Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN)
Ahlawat, Savita
Choudhary, Amit
Nayyar, Anand
Singh, Saurabh
Yoon, Byungun
[J]. SENSORS, 2020, 20 (12) : 1 - 18
[2] Program Guardian: screening system with a novel speaker recognition approach for smart TV
Chin, Yu-Hao
Tai, Tzu-Chiang
Zhao, Jia-Hao
Wang, Kuang-Yao
Hong, Chao-Tse
Wang, Jia-Ching
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (12) : 13881 - 13896
[3] Writer identification system for pre-segmented offline handwritten Devanagari characters using k-NN and SVM
Dargan, Shaveta
Kumar, Munish
Garg, Anupam
Thakur, Kutub
[J]. SOFT COMPUTING, 2020, 24 (13) : 10111 - 10122
[4] A comprehensive survey on the biometric recognition systems based on physiological and behavioral modalities
Dargan, Shaveta
Kumar, Munish
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
[5] Speech recognition in individuals with sensorineural hearing loss
de Andrade, Adriana Neves
Martinelli Iorio, Maria Cecilia
Gil, Daniela
[J]. BRAZILIAN JOURNAL OF OTORHINOLARYNGOLOGY, 2016, 82 (03) : 334 - 340
[6] Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks
Farhadipour, Aref
Veisi, Hadi
Asgari, Mohammad
Keyvanrad, Mohammad Ali
[J]. ETRI JOURNAL, 2018, 40 (05) : 643 - 652
[7] Improving word recognition in noise among hearing-impaired subjects with a single-channel cochlear noise-reduction algorithm
Fink, Nir
Furst, Miriam
Muchnik, Chava
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (03) : 1718 - 1731
[8] Early Detection of Diabetic Retinopathy Using PCA-Firefly Based Deep Learning Model
Gadekallu, Thippa Reddy
Khare, Neelu
Bhattacharya, Sweta
Singh, Saurabh
Maddikunta, Praveen Kumar Reddy
Ra, In-Ho
Alazab, Mamoun
[J]. ELECTRONICS, 2020, 9 (02)
[9] Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification
Ghezaiel, Wajdi
Ben Slimane, Amel
Ben Braiek, Ezzedine
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (20) : 20973 - 20988
[10] The optimal threshold for removing noise from speech is similar across normal and impaired hearing-a time-frequency masking study
Healy, Eric W.
Vasko, Jordan L.
Wang, DeLiang
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (06) : EL581 - EL586

← 1 2 3 4 →