INVESTIGATION OF DEEP BOLTZMANN MACHINES FOR PHONE RECOGNITION

被引：0

作者：

You, Zhao ^{[1
]}

Wang, Xiaorui ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

phone recognition; acoustic modeling; Deep Boltzmann Machines; Deep Neural Networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the past few years, deep neural networks (DNNs) achieved great successes in speech recognition. The layer-wise pre-trained deep belief network (DBN) is known as one of the critical factor to optimize the DNN. However, the DBN has one shortcoming that the pre-training procedure is in a greedy forward pass. The top-down influences on the inference process are ignored, thus the pre-trained DBN is suboptimal. In this paper, we attempt to apply deep Boltzmann machine (DBM) on acoustic modeling. DBM has the advantages that a top-down feedback is incorporated and the parameters of all layers can be jointly optimized. Experiments are conducted on the TIMIT phone recognition task to investigate the DBM-DNN acoustic model. Comparing with the DBN-DNN with same amount of parameters, phone error rate on the core test set is reduced by 3.8% relatively, and additional 5.1% by dropout fine-tuning.

引用

页码：7600 / 7603

页数：4

共 50 条

[41] USING MULTIPLE VERSIONS OF SPEECH INPUT IN PHONE RECOGNITION
Liberman, Mark
Yuan, Jiahong
Stolcke, Andreas
Wang, Wen
Mitra, Vikramjit
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7591 - 7595
[42] Efficient Segmental Conditional Random Fields for Phone Recognition
He, Yanzhang
Fosler-Lussier, Eric
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1896 - 1899
[43] Improvement of Phone Recognition Accuracy Using Articulatory Features
Manjunath, K. E.
Rao, K. Sreenivasa
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (02) : 704 - 728
[44] Investigation of Stochastic Hessian-Free Optimization In Deep Neural Networks For Speech Recognition
You, Zhao
Xu, Bo
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 450 - 453
[45] Robust phone set mapping using decision tree clustering for cross-lingual phone recognition
Sim, Khe Chai
Li, Haizhou
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4309 - 4312
[46] Phonetic Context Embeddings for DNN-HMM Phone Recognition
Badino, Leonardo
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 405 - 409
[47] A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task
Michalek, Josef
Vanek, Jan
TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 436 - 444
[48] DISCRIMINATIVE SEGMENTAL CASCADES FOR FEATURE-RICH PHONE RECOGNITION
Tang, Hao
Wang, Weiran
Gimpel, Kevin
Livescu, Karen
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 561 - 568
[49] An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition
Li, Bo
Tsao, Yu
Sim, Khe Chai
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3001 - +
[50] ANALYSIS OF PHONE CONFUSION IN EMG-BASED SPEECH RECOGNITION
Wand, Michael
Schultz, Tanja
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 757 - 760

← 1 2 3 4 5 →