JOINT UNSUPERVISED LEARNING OF HIDDEN MARKOV SOURCE MODELS AND SOURCE LOCATION MODELS FOR MULTICHANNEL SOURCE SEPARATION

被引:0
作者
Nakatani, Tomohiro [1 ]
Araki, Shoko [1 ]
Yoshioka, Takuya [1 ]
Fujimoto, Masakiyo [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年
关键词
Source separation; unsupervised learning; hidden Markov model; steering vector; log power spectrum;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper discusses a multichannel source separation approach that exploits the statistical characteristics of source location cues characterized by steering vector models (SM) and those of source log spectra characterized by hidden Markov models (spectral HMM). Recently, it was shown that the use of speaker independent spectral HMMs trained in advance substantially improves the quality of speech signals separated based on source location cues in a computationally efficient manner. However, with this approach, mismatches between the spectral HMMs and the observation may substantially degrade the separation quality, which limits the applicability of this approach. To overcome this problem, this paper proposes a method for learning the parameters of the spectral HMMs jointly with those of the SMs from the observed sound mixtures. Experimental results show that the proposed method works effectively for separation of convolutive sound mixtures.
引用
收藏
页码:237 / 240
页数:4
相关论文
共 7 条
[1]  
[Anonymous], P IEEE WORKSH APPL S
[2]  
Cooke M., SPEECH SEPARATION CH
[3]  
Nakatani T., 2010, P INT 2010
[4]   SCALED FACTORIAL HIDDEN MARKOV MODELS: A NEW TECHNIQUE FOR COMPENSATING GAIN DIFFERENCES IN MODEL-BASED SINGLE CHANNEL SPEECH SEPARATION [J].
Radfar, M. H. ;
Wong, W. ;
Dansereau, R. M. ;
Chan, W. -Y. .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :1918-1921
[5]   Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition [J].
Rennie, Steven J. ;
Hershey, John R. ;
Olsen, Peder A. .
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :176-181
[6]  
Roweis S. T., 2003, P EUR, P1009
[7]   Blind separation of speech mixtures via time-frequency masking [J].
Yilmaz, Ö ;
Rickard, S .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) :1830-1847