INVESTIGATION OF DEEP BOLTZMANN MACHINES FOR PHONE RECOGNITION

被引：0

作者：

You, Zhao ^{[1
]}

Wang, Xiaorui ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

phone recognition; acoustic modeling; Deep Boltzmann Machines; Deep Neural Networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the past few years, deep neural networks (DNNs) achieved great successes in speech recognition. The layer-wise pre-trained deep belief network (DBN) is known as one of the critical factor to optimize the DNN. However, the DBN has one shortcoming that the pre-training procedure is in a greedy forward pass. The top-down influences on the inference process are ignored, thus the pre-trained DBN is suboptimal. In this paper, we attempt to apply deep Boltzmann machine (DBM) on acoustic modeling. DBM has the advantages that a top-down feedback is incorporated and the parameters of all layers can be jointly optimized. Experiments are conducted on the TIMIT phone recognition task to investigate the DBM-DNN acoustic model. Comparing with the DBN-DNN with same amount of parameters, phone error rate on the core test set is reduced by 3.8% relatively, and additional 5.1% by dropout fine-tuning.

引用

页码：7600 / 7603

页数：4

共 50 条

[21] BOOSTING ATTRIBUTE AND PHONE ESTIMATION ACCURACIES WITH DEEP NEURAL NETWORKS FOR DETECTION-BASED SPEECH RECOGNITION
Yu, Dong
Siniscalchi, Sabato Marco
Deng, Li
Lee, Chin-Hui
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4169 - 4172
[22] Nonequilibrium thermodynamics of restricted Boltzmann machines
Salazar, Domingos S. P.
PHYSICAL REVIEW E, 2017, 96 (02)
[23] Facial Expression Recognition using Deep Boltzmann Machine from Thermal Infrared Images
He, Shan
Wang, Shangfei
Lan, Wuwei
Fu, Huan
Ji, Qiang
2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 239 - 244
[24] Attractor Manipulation in Denoising Autoencoders for Robust Phone Recognition
Reza, Shaghayegh
Seyyedsalehi, Seyyed Ali
Seyyedsalehi, Seyyedeh Zohreh
2021 29TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2021, : 454 - 459
[25] Natural Scene Recognition Based on Convolutional Neural Networks and Deep Boltzmannn Machines
Gao, Jingyu
Yang, Jinfu
Zhang, Jizhao
Li, Mingai
2015 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2015, : 2369 - 2374
[26] Deep Investigation of the Recent Advances in Dialectal Arabic Speech Recognition
Alsayadi, Hamzah A.
Abdelhamid, Abdelaziz A.
Hegazy, Islam
Alotaibi, Bandar
Fayed, Zaki T.
IEEE ACCESS, 2022, 10 : 57063 - 57079
[27] DEEP-LEVEL ACOUSTIC-TO-ARTICULATORY MAPPING FOR DBN-HMM BASED PHONE RECOGNITION
Badino, Leonardo
Canevari, Claudia
Fadiga, Luciano
Metta, Giorgio
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 370 - 375
[28] Graph signal recovery using restricted Boltzmann machines
Mohan, Ankith
Nakano, Aiichiro
Ferrara, Emilio
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
[29] Source and system features for phone recognition
Manjunath, K.
Rao, K.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (02) : 257 - 270
[30] Hierarchical Phone Recognition with Compositional Phonetics
Li, Xinjian
Li, Juncheng
Metze, Florian
Black, Alan W.
INTERSPEECH 2021, 2021, : 2461 - 2465

← 1 2 3 4 5 →