Robust HI and dysarthric speaker recognition - perceptual features and models

被引:0
作者
Revathi, A. [1 ]
Nagakrishnan, R. [1 ]
Sasikaladevi, N. [2 ]
机构
[1] SASTRA Deemed Univ, Dept ECE SEEE, Thanjavur, India
[2] SASTRA Deemed Univ, Dept CSE SoC, Thanjavur, India
关键词
HI speaker recognition; Dysarthric speaker identification; Vector quantization; Perceptual features; SPATIAL SPEECH RECOGNITION; CONSONANT RECOGNITION; HEARING; NOISE; IDENTIFICATION; LISTENERS; ALGORITHM;
D O I
10.1007/s11042-022-12184-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the necessity of having hearing impaired (HI) and dysarthric speakers be part of the person authentication system and it is considered to be imperative. Automated system on identifying speakers is evaluated by having the perceptual features with critical band analysis done in various non-linear frequency scales and vector quantization (VQ) & Fuzzy C means (FCM) based iterative clustering templates and multi-variant hidden Markov (MHMM) models as representative of HI or dysarthric speakers. For developing a training system, perceptual features are extracted from the speeches of HI or dysarthric speakers after the initial pre-processing techniques namely voice activity detection, pre-emphasis, frame blocking, and windowing contemplated on the speech utterances, and VQ & FCM clustering models and MHMM models are created for each speaker and the study is done on varying cluster and mixture size. The testing phase emphasizes the extraction of features from the test utterances, application of features to the templates, and classification is done based on minimum distance criterion for clustering technique and maximum log-likelihood criterion for MHMM technique. This algorithm gives the overall accuracy of 100% when the decision level fusion classification is done for the perceptual features with critical band analysis done in MEL, BARK, and ERB scales for all the clusters with variations in cluster size for both hearing impaired and dysarthric speaker recognition. Decision level fusion classification using FCM and MHMM technique provides low overall accuracy as compared to the VQ technique.
引用
收藏
页码:8215 / 8233
页数:19
相关论文
共 34 条
  • [1] Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN)
    Ahlawat, Savita
    Choudhary, Amit
    Nayyar, Anand
    Singh, Saurabh
    Yoon, Byungun
    [J]. SENSORS, 2020, 20 (12) : 1 - 18
  • [2] Program Guardian: screening system with a novel speaker recognition approach for smart TV
    Chin, Yu-Hao
    Tai, Tzu-Chiang
    Zhao, Jia-Hao
    Wang, Kuang-Yao
    Hong, Chao-Tse
    Wang, Jia-Ching
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (12) : 13881 - 13896
  • [3] Writer identification system for pre-segmented offline handwritten Devanagari characters using k-NN and SVM
    Dargan, Shaveta
    Kumar, Munish
    Garg, Anupam
    Thakur, Kutub
    [J]. SOFT COMPUTING, 2020, 24 (13) : 10111 - 10122
  • [4] A comprehensive survey on the biometric recognition systems based on physiological and behavioral modalities
    Dargan, Shaveta
    Kumar, Munish
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
  • [5] Speech recognition in individuals with sensorineural hearing loss
    de Andrade, Adriana Neves
    Martinelli Iorio, Maria Cecilia
    Gil, Daniela
    [J]. BRAZILIAN JOURNAL OF OTORHINOLARYNGOLOGY, 2016, 82 (03) : 334 - 340
  • [6] Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks
    Farhadipour, Aref
    Veisi, Hadi
    Asgari, Mohammad
    Keyvanrad, Mohammad Ali
    [J]. ETRI JOURNAL, 2018, 40 (05) : 643 - 652
  • [7] Improving word recognition in noise among hearing-impaired subjects with a single-channel cochlear noise-reduction algorithm
    Fink, Nir
    Furst, Miriam
    Muchnik, Chava
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (03) : 1718 - 1731
  • [8] Early Detection of Diabetic Retinopathy Using PCA-Firefly Based Deep Learning Model
    Gadekallu, Thippa Reddy
    Khare, Neelu
    Bhattacharya, Sweta
    Singh, Saurabh
    Maddikunta, Praveen Kumar Reddy
    Ra, In-Ho
    Alazab, Mamoun
    [J]. ELECTRONICS, 2020, 9 (02)
  • [9] Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification
    Ghezaiel, Wajdi
    Ben Slimane, Amel
    Ben Braiek, Ezzedine
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (20) : 20973 - 20988
  • [10] The optimal threshold for removing noise from speech is similar across normal and impaired hearing-a time-frequency masking study
    Healy, Eric W.
    Vasko, Jordan L.
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (06) : EL581 - EL586