Representation Learning for Single-Channel Source Separation and Bandwidth Extension

被引：15

作者：

Zoehrer, Matthias ^{[1
]}

Peharz, Robert ^{[2
]}

Pernkopf, Franz ^{[1
]}

机构：

[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, A-8010 Graz, Austria

[2] Med Univ Graz, IDN Inst Physiol, BioTechMed Graz, Brain Ears & Eyes Pattern Recognit Initiat, A-8010 Graz, Austria

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 12期

基金：

奥地利科学基金会;

关键词：

Bandwidth extension; deep neural networks (DNNs); generative stochastic networks; representation learning; single-channel source separation (SCSS); sum-product networks; SPEAKER ADAPTATION; SPEECH; ALGORITHM; SIGNAL; REGRESSION; NETWORKS; MODELS;

D O I：

10.1109/TASLP.2015.2470560

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.

引用

页码：2398 / 2409

页数：12

共 77 条

[1] ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[2] [Anonymous], P EUR C MACH LEARN E
[3] [Anonymous], 2012, Advances in Neural Information Processing Systems
[4] [Anonymous], NEURAL INF PROCESS S
[5] [Anonymous], P AUT SPEECH REC UND
[6] [Anonymous], P862 ITUT
[7] [Anonymous], P CORR
[8] [Anonymous], NEURAL INF PROCESS S
[9] [Anonymous], 300726 ETSI EN
[10] [Anonymous], 2012, P NIPS

← 1 2 3 4 5 6 7 8 →