Representation Learning for Single-Channel Source Separation and Bandwidth Extension

被引:15
|
作者
Zoehrer, Matthias [1 ]
Peharz, Robert [2 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Intelligent Syst Grp, A-8010 Graz, Austria
[2] Med Univ Graz, IDN Inst Physiol, BioTechMed Graz, Brain Ears & Eyes Pattern Recognit Initiat, A-8010 Graz, Austria
基金
奥地利科学基金会;
关键词
Bandwidth extension; deep neural networks (DNNs); generative stochastic networks; representation learning; single-channel source separation (SCSS); sum-product networks; SPEAKER ADAPTATION; SPEECH; ALGORITHM; SIGNAL; REGRESSION; NETWORKS; MODELS;
D O I
10.1109/TASLP.2015.2470560
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.
引用
收藏
页码:2398 / 2409
页数:12
相关论文
共 50 条
  • [1] DESIGNING MULTICHANNEL SOURCE SEPARATION BASED ON SINGLE-CHANNEL SOURCE SEPARATION
    Lopez, A. Ramirez
    Ono, N.
    Remes, U.
    Palomaki, K.
    Kurimo, M.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 469 - 473
  • [2] Single-channel phaseless blind source separation
    Humera Hameed
    Ali Ahmed
    Ubaid U. Fayyaz
    Telecommunication Systems, 2022, 80 : 469 - 475
  • [3] Single-channel phaseless blind source separation
    Hameed, Humera
    Ahmed, Ali
    Fayyaz, Ubaid U.
    TELECOMMUNICATION SYSTEMS, 2022, 80 (03) : 469 - 475
  • [4] Self-Adaption in Single-Channel Source Separation
    Wohlmayr, Michael
    Mohr, Ludwig
    Pernkopf, Franz
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1003 - 1007
  • [5] A maximum likelihood approach to single-channel source separation
    Jang, GJ
    Lee, TW
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (7-8) : 1365 - 1392
  • [6] REPRESENTATION MODELS IN SINGLE CHANNEL SOURCE SEPARATION
    Zoehrer, Matthias
    Pernkopf, Franz
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 713 - 717
  • [7] Deep-learning-based Single-channel Sound Source Separation in Noisy Environments
    Furuya, Ken'ichi
    Miura, Iori
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 71 - 72
  • [8] LEARNING A HIERARCHICAL DICTIONARY FOR SINGLE-CHANNEL SPEECH SEPARATION
    Bao, Guangzhao
    Xu, Yangfei
    Xu, Xu
    Ye, Zhongfu
    2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 476 - 479
  • [9] Learning a Discriminative Dictionary for Single-Channel Speech Separation
    Bao, Guangzhao
    Xu, Yangfei
    Ye, Zhongfu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) : 1130 - 1138
  • [10] Discriminative NMF and its application to single-channel source separation
    Weninger, Felix
    Le Roux, Jonathan
    Hershey, John R.
    Watanabe, Shinji
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 865 - 869