INVESTIGATION OF DEEP BOLTZMANN MACHINES FOR PHONE RECOGNITION

被引:0
|
作者
You, Zhao [1 ]
Wang, Xiaorui [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
关键词
phone recognition; acoustic modeling; Deep Boltzmann Machines; Deep Neural Networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the past few years, deep neural networks (DNNs) achieved great successes in speech recognition. The layer-wise pre-trained deep belief network (DBN) is known as one of the critical factor to optimize the DNN. However, the DBN has one shortcoming that the pre-training procedure is in a greedy forward pass. The top-down influences on the inference process are ignored, thus the pre-trained DBN is suboptimal. In this paper, we attempt to apply deep Boltzmann machine (DBM) on acoustic modeling. DBM has the advantages that a top-down feedback is incorporated and the parameters of all layers can be jointly optimized. Experiments are conducted on the TIMIT phone recognition task to investigate the DBM-DNN acoustic model. Comparing with the DBN-DNN with same amount of parameters, phone error rate on the core test set is reduced by 3.8% relatively, and additional 5.1% by dropout fine-tuning.
引用
收藏
页码:7600 / 7603
页数:4
相关论文
共 50 条
  • [41] USING MULTIPLE VERSIONS OF SPEECH INPUT IN PHONE RECOGNITION
    Liberman, Mark
    Yuan, Jiahong
    Stolcke, Andreas
    Wang, Wen
    Mitra, Vikramjit
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7591 - 7595
  • [42] Efficient Segmental Conditional Random Fields for Phone Recognition
    He, Yanzhang
    Fosler-Lussier, Eric
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1896 - 1899
  • [43] Improvement of Phone Recognition Accuracy Using Articulatory Features
    Manjunath, K. E.
    Rao, K. Sreenivasa
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (02) : 704 - 728
  • [44] Investigation of Stochastic Hessian-Free Optimization In Deep Neural Networks For Speech Recognition
    You, Zhao
    Xu, Bo
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 450 - 453
  • [45] Robust phone set mapping using decision tree clustering for cross-lingual phone recognition
    Sim, Khe Chai
    Li, Haizhou
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4309 - 4312
  • [46] Phonetic Context Embeddings for DNN-HMM Phone Recognition
    Badino, Leonardo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 405 - 409
  • [47] A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task
    Michalek, Josef
    Vanek, Jan
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 436 - 444
  • [48] DISCRIMINATIVE SEGMENTAL CASCADES FOR FEATURE-RICH PHONE RECOGNITION
    Tang, Hao
    Wang, Weiran
    Gimpel, Kevin
    Livescu, Karen
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 561 - 568
  • [49] An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition
    Li, Bo
    Tsao, Yu
    Sim, Khe Chai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3001 - +
  • [50] ANALYSIS OF PHONE CONFUSION IN EMG-BASED SPEECH RECOGNITION
    Wand, Michael
    Schultz, Tanja
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 757 - 760