A study on unsupervised monaural reverberant speech separation

被引:2
作者
Hemavathi, R. [1 ]
Kumaraswamy, R. [1 ]
机构
[1] Visvesvaraya Technol Univ, Siddaganga Inst Technol, Dept Elect & Commun Engn, Belagavi 572103, Tumakuru, India
关键词
Unsupervised speech separation; Speech intelligibility; Reverberant environment; Monaural recordings; Non-negative matrix factorization; DECOMPOSITION; SPEAKER; NUMBER;
D O I
10.1007/s10772-020-09706-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Separating individual source signals is a challenging task in musical and multitalker source separation. This work studies unsupervised monaural (co-channel) speech separation (UCSS) in reverberant environment. UCSS is the problem of separating the individual speakers from multispeaker speech without using any training data and with minimum information regarding mixing condition and sources. In this paper, state-of-art UCSS algorithms based on auditory and statistical approaches are evaluated for reverberant speech mixtures and results are discussed. This work also proposes to use multiresolution cochleagram and Constant Q Transform (CQT) spectrogram feature with two-dimensional Non-negative matrix factorization. Results show that proposed algorithm with CQT spectrogram feature gave an improvement of 1.986 and 1.262 in terms of speech intelligibility and 0.296 db and 0.561 db in terms of signal to interference ratio compared to state-of-art statistical and auditory approach respectively at T60 of 0.610s.
引用
收藏
页码:451 / 457
页数:7
相关论文
共 33 条
[1]  
[Anonymous], 2006, ISCA INT C SPOK LANG
[3]   Convex Divergence ICA for Blind Source Separation [J].
Chien, Jen-Tzung ;
Hsieh, Hsin-Lung .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :302-313
[4]   Features for Masking-Based Monaural Speech Separation in Reverberant Conditions [J].
Delfarah, Masood ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) :1085-1094
[5]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[6]   Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations [J].
Gao, Bin ;
Woo, W. L. ;
Dlay, S. S. .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2013, 60 (03) :662-675
[7]   Single-Channel Source Separation Using EMD-Subband Variable Regularized Sparse Features [J].
Gao, Bin ;
Woo, W. L. ;
Dlay, S. S. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :961-976
[8]  
Hadad E, 2014, 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), P313, DOI 10.1109/IWAENC.2014.6954309
[9]   An Unsupervised Approach to Cochannel Speech Separation [J].
Hu, Ke ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01) :120-129
[10]   Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation [J].
Huang, Po-Sen ;
Kim, Minje ;
Hasegawa-Johnson, Mark ;
Smaragdis, Paris .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2136-2147