Speech Separation based on Deep Belief Network

被引：0

作者：

Wu Haijia ^{[1
]}

Zhang Xiongwei ^{[1
]}

Zhang Liangliang ^{[1
]}

Zou Xia ^{[1
]}

机构：

[1] PLA Univ Sci & Technol, Coll Command Informat & Syst, Nanjing 210007, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE | 2015年

关键词：

speech separation; deep learning; deep belief network; restricted Boltzmann machine; autoencoder; SIGNAL; SEGREGATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Thanks to its hierarchical and generative nature, Deep Belief Network (DBN) is effective to feature representation and extraction in signal processing. In this paper, DBN is investigated and implemented to monaural speech separation. Firstly, two separate DBNs are trained to extract features from mixed noisy signals and target clean speech respectively. Subsequently, the two types of extracted features are associated together by training a BP neural network to obtain a mapping from the features of mixed signals to the features of target speech. Finally, by performing DBN and the above mapping neural network, target speech can be estimated from the input mixed signals. Experiments are conducted on different kinds of mixed signals including female/male speech mixtures, human-speech/Gaussian-noise audio mixtures, and human-speech/music audio mixtures. The PESQ scores of the extracted speech are 3.32, 2.59, and 3.42 respectively, which illustrates that the model performs well on speech separation tasks, especially on the mixed signals where the inference signals have obvious spectral structures.

引用

页码：1486 / 1493

页数：8

共 14 条

[1]

Bengio Y, 2011, LECT NOTES ARTIF INT, V6926, P1, DOI 10.1007/978-3-642-24477-3_1

[2] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[3]

Deng L., 2010, ISCA CHIB JAP, P9

[4] A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].

EPHRAIM, Y ;

VANTREES, HL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266

[5] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

[6] Segregation of unvoiced speech from nonspeech interference [J].

Hu, Guoning ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (02) :1306-1319

[7]

HU Y, 2006, P INTERSPEECH PHIL P

[8] A Supervised Learning Approach to Monaural Segregation of Reverberant Speech [J].

Jin, Zhaozhang ;

Wang, DeLiang .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :625-638

[9]

Keyvanrad M. A., 2014, INT C LEARN REPR ICL, P4

[10] Convolutive BSS of Short Mixtures by ICA Recursively Regularized Across Frequencies [J].

Nesta, Francesco ;

Svaizer, Piergiorgio ;

Omologo, Maurizio .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03) :624-639

← 1 2 →