Deep Activation Mixture Model for Speech Recognition

被引:1
作者
Wu, Chunyang [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
deep learning; mixture model; speaker adaptation; NEURAL-NETWORK; ADAPTATION;
D O I
10.21437/Interspeech.2017-1233
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning approaches achieve state-of-the-art performance in a range of applications. including speech recognition. However, the parameters of the deep neural network (DNN) are hard to interpret, which makes regularisation and adaptation to speaker or acoustic conditions challenging. This paper proposes the deep activation mixture model (DAMM) to address these problems. The output of one hidden layer is modelled as the sum of a mixture and residual models. The mixture model forms an activation function contour while the residual one models fluctuations around the contour. The use of the mixture model gives two advantages: First. it introduces a novel regularisation on the DNN. Second, it allows novel adaptation schemes. The proposed approach is evaluated on a large-vocabulary U.S. English broadcast news task. It yields a slightly better performance than the DNN baselines. and on the utterance-level unsupervised adaptation, the adapted DAMM acquires further performance gains.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
[41]   Deep Convolutional Neural Network for Arabic Speech Recognition [J].
Amari, Rafik ;
Noubigh, Zouhaira ;
Zrigui, Salah ;
Berchech, Dhaou ;
Nicolas, Henri ;
Zrigui, Mounir .
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 :120-134
[42]   Improvement on Speech Depression Recognition Based on Deep Networks [J].
Li, Jinming ;
Fu, Xiaoyan ;
Shao, Zhuhong ;
Shang, Yuanyuan .
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, :2705-2709
[43]   Kannada Continuous Speech Recognition Using Deep Learning [J].
Paul, Shubhojeet ;
Bhattacharjee, Vandana ;
Saha, Sujan Kumar .
ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT IV, 2024, 2093 :258-269
[44]   DEEP RECURRENT REGULARIZATION NEURAL NETWORK FOR SPEECH RECOGNITION [J].
Chien, Jen-Tzung ;
Lu, Tsai-Wei .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4560-4564
[45]   Whispered speech recognition using deep denoising autoencoder [J].
Grozdic, Dorde T. ;
Jovicic, Slobodan T. ;
Subotic, Misko .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 59 :15-22
[46]   Evaluating deep learning architectures for Speech Emotion Recognition [J].
Fayek, Haytham M. ;
Lech, Margaret ;
Cavedon, Lawrence .
NEURAL NETWORKS, 2017, 92 :60-68
[47]   Lightweight Deep Learning Framework for Speech Emotion Recognition [J].
Akinpelu, Samson ;
Viriri, Serestina ;
Adegun, Adekanmi .
IEEE ACCESS, 2023, 11 :77086-77098
[48]   Deep Learning Techniques for Speech Emotion Recognition : A Review [J].
Pandey, Sandeep Kumar ;
Shekhawat, H. S. ;
Prasanna, S. R. M. .
2019 29TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2019, :197-202
[49]   Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition [J].
Huang, Zhen ;
Siniscalchi, Sabato Marco ;
Lee, Chin-Hui .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) :64-75
[50]   A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition [J].
Li, Bo ;
Sim, Khe Chai .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :1770-1773