Deep Activation Mixture Model for Speech Recognition

被引:1
作者
Wu, Chunyang [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
deep learning; mixture model; speaker adaptation; NEURAL-NETWORK; ADAPTATION;
D O I
10.21437/Interspeech.2017-1233
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning approaches achieve state-of-the-art performance in a range of applications. including speech recognition. However, the parameters of the deep neural network (DNN) are hard to interpret, which makes regularisation and adaptation to speaker or acoustic conditions challenging. This paper proposes the deep activation mixture model (DAMM) to address these problems. The output of one hidden layer is modelled as the sum of a mixture and residual models. The mixture model forms an activation function contour while the residual one models fluctuations around the contour. The use of the mixture model gives two advantages: First. it introduces a novel regularisation on the DNN. Second, it allows novel adaptation schemes. The proposed approach is evaluated on a large-vocabulary U.S. English broadcast news task. It yields a slightly better performance than the DNN baselines. and on the utterance-level unsupervised adaptation, the adapted DAMM acquires further performance gains.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
  • [21] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    [J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [22] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [23] Deep Belief Network Optimization in Speech Recognition
    Prasetio, Murman Dwi
    Hayashida, Tomohiro
    Nishizaki, Ichiro
    Sekizaki, Shinya
    [J]. 2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 138 - 143
  • [24] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
    Pham, Tuan D.
    [J]. 2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
  • [25] Fake Speech Recognition Using Deep Learning
    Camacho, Steven
    Maria Ballesteros, Dora
    Renza, Diego
    [J]. APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2021, 2021, 1431 : 38 - 48
  • [26] DeepAdversaryDefense: A Deep Model to Identify and Prevent Adversarial Attacks against Medical Speech Recognition
    Panwar, Kirtee
    Singh, Akansha
    Singh, Krishna Kant
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE, IDDM 2022, 2022, 3302
  • [27] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [28] Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition
    Tan, Tian
    Qian, Yanmin
    Hu, Hu
    Zhou, Ying
    Ding, Wen
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1393 - 1405
  • [29] An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition
    Wu, Bo
    Li, Kehuang
    Ge, Fengpei
    Huang, Zhen
    Yang, Minglei
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1289 - 1300
  • [30] A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses as Case Study
    Alrumiah, Sarah S.
    Al-Shargabi, Amal A.
    [J]. IEEE ACCESS, 2023, 11 : 81348 - 81360