Stochastic gradient descent analysis for the evaluation of a speaker recognition

被引：0

作者：

Ashrf Nasef

Marina Marjanović-Jakovljević

Angelina Njeguš

机构：

[1] Singidunum University,Faculty of Electrical Engineering and Computing

[2] Singidunum University,Faculty of Informatics and Computing

来源：

Analog Integrated Circuits and Signal Processing | 2017年 / 90卷

关键词：

Pattern recognition; Speech analysis; Deep learning Neural Network; Stochastic gradient descent; Learning rate; Dropout rate;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Performance optimization in speaker recognition is a challenging task in the field of vocal based human-computer interaction. Many researches have shown that deep learning Neural Network methods have the best performance in comparison with other classifiers. However, those methods with many parameters require a lot of tunings in order to optimize the performance in different supervised learning tasks. In this paper, we show that picking a good combination of parameters can significantly improve the performance of Stochastic Gradient Descent deep learning Neural Network method in automatic speaker recognition even in a noisy environment. Parameters that are analyzed are learning rate, hidden and input layer dropout rate.

引用

页码：389 / 397

页数：8

共 53 条

[1]

Rashmi CR(2014)Review of algorithms and applications in speech recognition system International Journal of Computer Science and Information Technologies 5 5258-5262

[2]

Srinivas V(2014)Neural Network based classification for speaker identification International Journal of Signal Processing, Image Processing and Pattern Recognition 7 109-120

[3]

Santhirani C(2009)Speech recognition by machine: A review International Journal of Computer Science and Information Security 6 181-205

[4]

Madhu T(2012)Deep Neual Networks for acoustic modeling in speech recognition IEEE Signal Processing Magazine 29 82-97

[5]

Anusuya MA(1997)Speaker recognition: A tutorial Proceedings of the IEEE 85 1437-1462

[6]

Katti SK(2015)Deep Neural Network Approaches to speaker and language recognition IEEE Signal Processing Letters 22 1671-1675

[7]

Hinton G(2006)Reducing the dimensionality of data with neural networks Science 313 504-507

[8]

Deng L(2005)Modeling prosodic feature sequences for speaker recognition Speech Communication 46 455-472

[9]

Yu D(2009)Prosodic and other long-term features for speaker diarization IEEE Transactions on Audio, Speech and Language Processing 17 985-993

[10]

Dahl G(1980)Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing 28 357-366

← 1 2 3 4 5 6 →