Co-channel Speech Separation Based on Amplitude Modulation Spectrum Analysis

被引:0
作者
Qi Hu
Man-Gui Liang
机构
[1] Beijing Jiaotong University,Institute of Information Science
来源
Circuits, Systems, and Signal Processing | 2014年 / 33卷
关键词
Amplitude modulation spectrum; Binary mask; Computational auditory scene analysis (CASA); Speech separation;
D O I
暂无
中图分类号
学科分类号
摘要
A lot of effort has been made to achieve co-channel (two-talker) speech separation. However, the comprehensive analysis of the amplitude modulation spectrum (AMS) to address this problem has received little attention. In this paper, we propose an approach to exploit the AMS and to perform the separation based on the framework of computational auditory scene analysis (CASA). Specifically, this method utilizes the periodicity encoded in the AMS and then makes the channel selection. The main features of the approach are: (1) the reassignment method is used to improve the spectral resolution of the AMS in short duration; (2) a template-based pitch detector is used to determine the dominant fundamental frequency (F0) in an individual channel; (3) segmentation and grouping, the two stages in the CASA-based approaches, are employed to increase the robustness of channel selection. Systematic evaluation and comparison show that the proposed approach yields better performance than the previous system.
引用
收藏
页码:565 / 588
页数:23
相关论文
共 71 条
[1]  
Auger F.(1995)Improving the readability of time-frequency and time-scale representations by the reassignment method IEEE Trans. Signal Process. 43 1068-1089
[2]  
Flandrin P.(1994)Computational auditory scene analysis Comput. Speech Lang. 8 297-336
[3]  
Brown G.(1996)A quantitative model of the “effective” signal processing in the auditory system. I. Model structure J. Acoust. Soc. Am. 99 3615-3622
[4]  
Cooke M.(2006)A Bayesian approach for blind separation of sparse sources IEEE Trans. Audio Speech Lang. Process. 14 2174-2188
[5]  
Dau T.(1988)Measurement of pitch by subharmonic summation J. Acoust. Soc. Am. 83 257-264
[6]  
Puschel D.(2004)Monaural speech segregation based on pitch tracking and amplitude modulation IEEE Trans. Neural Netw. 15 1135-1150
[7]  
Kohlrausch A.(2010)A tandem algorithm for pitch estimation and voiced speech segregation IEEE Trans. Audio Speech Lang. Process. 18 2067-2079
[8]  
Fevotte C.(2007)Auditory segmentation based on onset and offset analysis IEEE Trans. Audio Speech Lang. Process. 15 396-405
[9]  
Godsill S.(2010)Super-human multi-talker speech recognition: a graphical modeling approach Comput. Speech Lang. 24 45-66
[10]  
Hermes D.(2012)A classification based approach to speech segregation J. Acoust. Soc. Am. 132 3475-3483