A study on unsupervised monaural reverberant speech separation

被引:0
作者
R. Hemavathi
R. Kumaraswamy
机构
[1] Siddaganga Institute of Technology (Affiliated to Visvesvaraya Technological University,Department of Electronics and Communication Engineering
[2] Belagavi),undefined
来源
International Journal of Speech Technology | 2020年 / 23卷
关键词
Unsupervised speech separation; Speech intelligibility; Reverberant environment; Monaural recordings; Non-negative matrix factorization;
D O I
暂无
中图分类号
学科分类号
摘要
Separating individual source signals is a challenging task in musical and multitalker source separation. This work studies unsupervised monaural (co-channel) speech separation (UCSS) in reverberant environment. UCSS is the problem of separating the individual speakers from multispeaker speech without using any training data and with minimum information regarding mixing condition and sources. In this paper, state-of-art UCSS algorithms based on auditory and statistical approaches are evaluated for reverberant speech mixtures and results are discussed. This work also proposes to use multiresolution cochleagram and Constant Q Transform (CQT) spectrogram feature with two-dimensional Non-negative matrix factorization. Results show that proposed algorithm with CQT spectrogram feature gave an improvement of 1.986 and 1.262 in terms of speech intelligibility and 0.296 db and 0.561 db in terms of signal to interference ratio compared to state-of-art statistical and auditory approach respectively at T60 of 0.610s.
引用
收藏
页码:451 / 457
页数:6
相关论文
共 66 条
[1]  
Cherry EC(1953)Some experiments on the recognition of speech, with one and with two ears The Journal of the Acoustical Society of America 25 975-979
[2]  
Chien JT(2012)Convex divergence ica for blind source separation IEEE Transactions on Audio, Speech, and Language Processing 20 302-313
[3]  
Hsieh HL(2017)Features for masking-based monaural speech separation in reverberant conditions IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 1085-1094
[4]  
Delfarah M(2009)Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis Neural Computation 21 793-830
[5]  
Wang D(2011)Single-channel source separation using emd-subband variable regularized sparse features IEEE Transactions on Audio, Speech, and Language Processing 19 961-976
[6]  
Fevotte C(2013)Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura saito nonnegative matrix two-dimensional factorizations IEEE Transactions on Circuits and Systems I: Regular Papers 60 662-675
[7]  
Bertin N(2015)Joint optimization of masks and deep recurrent neural networks for monaural source separation IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 2136-2147
[8]  
Durrieu J(2013)An unsupervised approach to cochannel speech separation IEEE Transactions on Audio, Speech, and Language Processing 21 122-131
[9]  
Gao B(1999)Fast and robust fixed-point algorithms for independent component analysis IEEE Transactions on Neural Networks 10 626-634
[10]  
Woo WL(2017)Single channel speech separation based on empirical mode decomposition and hilbert transform IET Signal Processing 11 579-586