Deep Activation Mixture Model for Speech Recognition

被引:1
作者
Wu, Chunyang [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
deep learning; mixture model; speaker adaptation; NEURAL-NETWORK; ADAPTATION;
D O I
10.21437/Interspeech.2017-1233
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning approaches achieve state-of-the-art performance in a range of applications. including speech recognition. However, the parameters of the deep neural network (DNN) are hard to interpret, which makes regularisation and adaptation to speaker or acoustic conditions challenging. This paper proposes the deep activation mixture model (DAMM) to address these problems. The output of one hidden layer is modelled as the sum of a mixture and residual models. The mixture model forms an activation function contour while the residual one models fluctuations around the contour. The use of the mixture model gives two advantages: First. it introduces a novel regularisation on the DNN. Second, it allows novel adaptation schemes. The proposed approach is evaluated on a large-vocabulary U.S. English broadcast news task. It yields a slightly better performance than the DNN baselines. and on the utterance-level unsupervised adaptation, the adapted DAMM acquires further performance gains.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
[31]   A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses as Case Study [J].
Alrumiah, Sarah S. ;
Al-Shargabi, Amal A. .
IEEE ACCESS, 2023, 11 :81348-81360
[32]   Customized deep learning based Turkish automatic speech recognition system supported by language model [J].
Gormez, Yasin .
PEERJ COMPUTER SCIENCE, 2024, 10
[33]   Deep Learning Speech Synthesis Model for Word/Character-Level Recognition in the Tamil Language [J].
Rajendran, Sukumar ;
Raja, Kiruba Thangam ;
Nagarajan, G. ;
Dass, A. Stephen ;
Kumar, M. Sandeep ;
Jayagopal, Prabhu .
INTERNATIONAL JOURNAL OF E-COLLABORATION, 2023, 19 (04) :20-20
[34]   Speech Emotion Recognition Integrating Paralinguistic Features and Auto-encoders in a Deep Learning Model [J].
Fonnegra, Ruben D. ;
Diaz, Gloria M. .
HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 :385-396
[35]   A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids [J].
Nossier, Soha A. ;
Wall, Julie ;
Moniri, Mansour ;
Glackin, Cornelius ;
Cannings, Nigel .
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, :553-558
[36]   Speech emotion recognition with deep convolutional neural networks [J].
Issa, Dias ;
Demirci, M. Fatih ;
Yazici, Adnan .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[37]   DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION [J].
Zhang, Wei ;
Cui, Xiaodong ;
Finkler, Ulrich ;
Kingsbury, Brian ;
Saon, George ;
Kung, David ;
Picheny, Michael .
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :5706-5710
[38]   Recognition of English speech - using a deep learning algorithm [J].
Wang, Shuyan .
JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
[39]   Deep Learning of Speech Features for Improved Phonetic Recognition [J].
Lee, Jaehyung ;
Lee, Soo-Young .
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :1256-1259
[40]   On Comparison of Deep Learning Architectures for Distant Speech Recognition [J].
Sustika, Rika ;
Yuliani, Asri R. ;
Zaenudin, Efendi ;
Pardede, Hilman F. .
2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, :17-21