A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-based Sparse Representation

被引:2
作者
Liu, Bin [1 ]
Tao, Jianhua [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
中国国家自然科学基金; 中国国家社会科学基金;
关键词
BLSTM-RNN; artificial bandwidth extension; rich acoustic features; exemplar-based sparse representation; NARROW-BAND; SPEECH; CONVERSION; ALGORITHM; MEMORY;
D O I
10.21437/Interspeech.2016-772
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a two stages artificial bandwidth extension (ABE) framework which combine deep bidirectional Long Short Term Memory (BLSTM) recurrent neural network with exemplar-based sparse representation to estimate missing frequency band. It demonstrates the suitability of proposed method for modeling log power spectra of speech signals in ABE. The BLSTM-RNN which can capture information from anywhere in the feature sequence is used to estimate the log power spectra in the high-band firstly and the exemplar-based sparse representation which could alleviate the over smoothing problem is applied to generated log power spectra in the second stage. In addition, rich acoustic features in the low-band are considered to reduce the reconstruction error. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the different baseline methods.
引用
收藏
页码:3778 / 3782
页数:5
相关论文
共 32 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[3]  
Cheng Y., 1992, IEEE T SPEECH AUDIO, V2, P544
[4]   An audio-visual corpus for speech perception and automatic speech recognition (L) [J].
Cooke, Martin ;
Barker, Jon ;
Cunningham, Stuart ;
Shao, Xu .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424
[5]   Image denoising via sparse and redundant representations over learned dictionaries [J].
Elad, Michael ;
Aharon, Michal .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (12) :3736-3745
[6]  
Enbom N., 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351), P171, DOI 10.1109/SCFT.1999.781521
[7]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[8]  
Fan Y., 2014, Fifteenth Annual Conference of the International Speech Communication Association
[9]  
Garofolo J., 1988, Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database
[10]   Learning precise timing with LSTM recurrent networks [J].
Gers, FA ;
Schraudolph, NN ;
Schmidhuber, J .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :115-143