INVESTIGATION OF DEEP BOLTZMANN MACHINES FOR PHONE RECOGNITION

被引:0
|
作者
You, Zhao [1 ]
Wang, Xiaorui [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
关键词
phone recognition; acoustic modeling; Deep Boltzmann Machines; Deep Neural Networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the past few years, deep neural networks (DNNs) achieved great successes in speech recognition. The layer-wise pre-trained deep belief network (DBN) is known as one of the critical factor to optimize the DNN. However, the DBN has one shortcoming that the pre-training procedure is in a greedy forward pass. The top-down influences on the inference process are ignored, thus the pre-trained DBN is suboptimal. In this paper, we attempt to apply deep Boltzmann machine (DBM) on acoustic modeling. DBM has the advantages that a top-down feedback is incorporated and the parameters of all layers can be jointly optimized. Experiments are conducted on the TIMIT phone recognition task to investigate the DBM-DNN acoustic model. Comparing with the DBN-DNN with same amount of parameters, phone error rate on the core test set is reduced by 3.8% relatively, and additional 5.1% by dropout fine-tuning.
引用
收藏
页码:7600 / 7603
页数:4
相关论文
共 50 条
  • [21] BOOSTING ATTRIBUTE AND PHONE ESTIMATION ACCURACIES WITH DEEP NEURAL NETWORKS FOR DETECTION-BASED SPEECH RECOGNITION
    Yu, Dong
    Siniscalchi, Sabato Marco
    Deng, Li
    Lee, Chin-Hui
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4169 - 4172
  • [22] Nonequilibrium thermodynamics of restricted Boltzmann machines
    Salazar, Domingos S. P.
    PHYSICAL REVIEW E, 2017, 96 (02)
  • [23] Facial Expression Recognition using Deep Boltzmann Machine from Thermal Infrared Images
    He, Shan
    Wang, Shangfei
    Lan, Wuwei
    Fu, Huan
    Ji, Qiang
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 239 - 244
  • [24] Attractor Manipulation in Denoising Autoencoders for Robust Phone Recognition
    Reza, Shaghayegh
    Seyyedsalehi, Seyyed Ali
    Seyyedsalehi, Seyyedeh Zohreh
    2021 29TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2021, : 454 - 459
  • [25] Natural Scene Recognition Based on Convolutional Neural Networks and Deep Boltzmannn Machines
    Gao, Jingyu
    Yang, Jinfu
    Zhang, Jizhao
    Li, Mingai
    2015 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2015, : 2369 - 2374
  • [26] Deep Investigation of the Recent Advances in Dialectal Arabic Speech Recognition
    Alsayadi, Hamzah A.
    Abdelhamid, Abdelaziz A.
    Hegazy, Islam
    Alotaibi, Bandar
    Fayed, Zaki T.
    IEEE ACCESS, 2022, 10 : 57063 - 57079
  • [27] DEEP-LEVEL ACOUSTIC-TO-ARTICULATORY MAPPING FOR DBN-HMM BASED PHONE RECOGNITION
    Badino, Leonardo
    Canevari, Claudia
    Fadiga, Luciano
    Metta, Giorgio
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 370 - 375
  • [28] Graph signal recovery using restricted Boltzmann machines
    Mohan, Ankith
    Nakano, Aiichiro
    Ferrara, Emilio
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
  • [29] Source and system features for phone recognition
    Manjunath, K.
    Rao, K.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (02) : 257 - 270
  • [30] Hierarchical Phone Recognition with Compositional Phonetics
    Li, Xinjian
    Li, Juncheng
    Metze, Florian
    Black, Alan W.
    INTERSPEECH 2021, 2021, : 2461 - 2465