INVESTIGATION OF DEEP BOLTZMANN MACHINES FOR PHONE RECOGNITION

被引：0

作者：

You, Zhao ^{[1
]}

Wang, Xiaorui ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

phone recognition; acoustic modeling; Deep Boltzmann Machines; Deep Neural Networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In the past few years, deep neural networks (DNNs) achieved great successes in speech recognition. The layer-wise pre-trained deep belief network (DBN) is known as one of the critical factor to optimize the DNN. However, the DBN has one shortcoming that the pre-training procedure is in a greedy forward pass. The top-down influences on the inference process are ignored, thus the pre-trained DBN is suboptimal. In this paper, we attempt to apply deep Boltzmann machine (DBM) on acoustic modeling. DBM has the advantages that a top-down feedback is incorporated and the parameters of all layers can be jointly optimized. Experiments are conducted on the TIMIT phone recognition task to investigate the DBM-DNN acoustic model. Comparing with the DBN-DNN with same amount of parameters, phone error rate on the core test set is reduced by 3.8% relatively, and additional 5.1% by dropout fine-tuning.

引用

页码：7600 / 7603

页数：4

共 50 条

[31] Ensemble of Gaussian Mixture Localized Neural Networks with Application to Phone Recognition
Travadi, Ruchir
Narayanan, Shrikanth
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1903 - 1907
[32] Investigation of Deep Neural Networks for Robust Recognition of Nonlinearly Distorted Speech
Seps, Ladislav
Malek, Jiri
Cerva, Petr
Nouza, Jan
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 363 - 367
[33] Investigation of Full-Sequence Training of Deep Belief Networks for Speech Recognition
Mohamed, Abdel-rahman
Yu, Dong
Deng, Li
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2850 - +
[34] Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling
Chi Nhan Duong
Khoa Luu
Kha Gia Quach
Bui, Tien D.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (05) : 437 - 455
[35] Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling
Chi Nhan Duong
Khoa Luu
Kha Gia Quach
Tien D. Bui
International Journal of Computer Vision, 2019, 127 : 437 - 455
[36] Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network
Liu, Da-rong
Hsu, Po-chun
Chen, Yi-chen
Huang, Sung-feng
Chuang, Shun-po
Wu, Da-yi
Lee, Hung-yi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 230 - 243
[37] Improvement of Phone Recognition Accuracy Using Articulatory Features
K. E. Manjunath
K. Sreenivasa Rao
Circuits, Systems, and Signal Processing, 2018, 37 : 704 - 728
[38] IMPROVING SPEECH RECOGNITION BY EXPLICIT MODELING OF PHONE DELETIONS
Ko, Tom
Mak, Brian
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4858 - 4861
[39] INVESTIGATION OF DEEP NEURAL NETWORKS (DNN) FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION: WHY DNN SURPASSES GMMS IN ACOUSTIC MODELING
Pan, Jia
Liu, Cong
Wang, Zhiguo
Hu, Yu
Jiang, Hui
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 301 - 305
[40] Development of Multilingual Phone Recognition System for Indian Languages
Manjunath, K. E.
Rao, K. Sreenivasa
Jayagopi, Dinesh Babu
2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2017,

← 1 2 3 4 5 →