Neural Network based Speaker Classification and Verification Systems with Enhanced Features

被引:0
作者
Ge, Zhenhao [1 ]
Iyer, Ananth N. [1 ]
Cheluvaraja, Srinath [1 ]
Sundaram, Ram [1 ]
Ganapathiraju, Aravind [1 ]
机构
[1] Genesys Inc, Indianapolis, IN 46278 USA
来源
PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) | 2017年
关键词
Neural network; speaker classification; speaker verification; feature engineering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100% classification rate in classification and less than 6% Equal Error Rate (ERR), using merely about 1 second and 5 seconds of data respectively. Features with stricter Voice Active Detection (VAD) than the regular one for speech recognition ensure extracting stronger voiced portion for speaker recognition, speaker-level mean and variance normalization helps to eliminate the discrepancy between samples from the same speaker. Both are proven to improve the system performance. In building the neural network speaker classifier, the network structure parameters are optimized with grid search and dynamically reduced regularization parameters are used to avoid training terminated in local minimum. It enables the training goes further with lower cost. In speaker verification, performance is improved with prediction score normalization, which rewards the speaker identity indices with distinct peaks and penalizes the weak ones with high scores but more competitors, and speaker-specific thresholding, which significantly reduces ERR in the ROC curve. TIMIT corpus with 8K sampling rate is used here. First 200 male speakers are used to train and test the classification performance. The testing files of them are used as in-domain registered speakers, while data from the remaining 126 male speakers are used as out-of-domain speakers, i.e. imposters in speaker verification.
引用
收藏
页码:1089 / 1094
页数:6
相关论文
共 16 条
  • [1] [Anonymous], CRIM MONTREAL REPOR
  • [2] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
  • [3] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [4] Dehak Najim, 2011, INTERSPEECH, P857
  • [5] Ellis D.P. W., 2005, PLP and RASTA (and MFCC, and inversion) in Matlab
  • [6] Farrell Kevin R, 1994, AC SPEECH SIGN PROC, V1, pI
  • [7] State-of-the-art performance in text-independent speaker verification through open-source software
    Fauve, Benoit G. B.
    Matrouf, Driss
    Scheffer, Nicolas
    Bonastre, Jean-Francois
    Mason, John S. D.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 1960 - 1968
  • [8] Giannakopoulos T., 2009, METHOD SILENCE REMOV
  • [9] Higgins Alan L, 1994, US Patent, Patent No. 5339385
  • [10] Ng A, NEURAL NETWORKS LEAR