Deep Activation Mixture Model for Speech Recognition

被引：1

作者：

Wu, Chunyang ^{[1
]}

Gales, Mark J. F. ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

deep learning; mixture model; speaker adaptation; NEURAL-NETWORK; ADAPTATION;

D O I：

10.21437/Interspeech.2017-1233

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning approaches achieve state-of-the-art performance in a range of applications. including speech recognition. However, the parameters of the deep neural network (DNN) are hard to interpret, which makes regularisation and adaptation to speaker or acoustic conditions challenging. This paper proposes the deep activation mixture model (DAMM) to address these problems. The output of one hidden layer is modelled as the sum of a mixture and residual models. The mixture model forms an activation function contour while the residual one models fluctuations around the contour. The use of the mixture model gives two advantages: First. it introduces a novel regularisation on the DNN. Second, it allows novel adaptation schemes. The proposed approach is evaluated on a large-vocabulary U.S. English broadcast news task. It yields a slightly better performance than the DNN baselines. and on the utterance-level unsupervised adaptation, the adapted DAMM acquires further performance gains.

引用

页码：1611 / 1615

页数：5

共 29 条

[11]

Heiga Zen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P3844, DOI 10.1109/ICASSP.2014.6854321

[12] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[13]

Li B, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P526

[14]

Richmond K., 2006, Interspeech

[15]

Seide F., 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), P24, DOI 10.1109/ASRU.2011.6163899

[16]

Seltzer ML, 2013, INT CONF ACOUST SPEE, P6965, DOI 10.1109/ICASSP.2013.6639012

[17]

Srivastava N, 2014, J MACH LEARN RES, V15, P1929

[18]

Swietojanski P, 2015, INT CONF ACOUST SPEE, P4305, DOI 10.1109/ICASSP.2015.7178783

[19]

Swietojanski P, 2014, IEEE W SP LANG TECH, P171, DOI 10.1109/SLT.2014.7078569

[20]

Tan S, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P617, DOI 10.1109/ASRU.2015.7404853

← 1 2 3 →