IMPROVED SINGLE-CHANNEL SPEECH SEPARATION USING SINUSOIDAL MODELING

被引:11
作者
Mowlaee, Pejman [1 ]
Christensen, Mads Graesboll [2 ]
Jensen, Soren Holdt [1 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
[2] Aalborg Univ, Dept Media Technol, Aalborg, Denmark
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Mixture estimation; single-channel speech separation; mask-based methods; speaker codebook; RECOGNITION;
D O I
10.1109/ICASSP.2010.5496263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a novel single-channel separation approach to improve the separation performance while recovering the signals from a mixture. The key idea in this research is to employ a mixture estimator based on unconstrained modified sinusoidal parameters. Compared to the mixmax (binary mask) and Wiener filter (softmask) approaches, the proposed approach works independently of pitch estimates. Furthermore, it is observed that it can achieve acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios while bringing down the complexity by replacing STFT with sinusoidal parameters. Improvements made by the proposed approach are demonstrated by employing PESQ as our objective measure and MUSHRA listening test as our subjective evaluation.
引用
收藏
页码:21 / 24
页数:4
相关论文
共 16 条
[1]  
[Anonymous], BS15341 ITU R
[2]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[3]   An audio-visual corpus for speech perception and automatic speech recognition (L) [J].
Cooke, Martin ;
Barker, Jon ;
Cunningham, Stuart ;
Shao, Xu .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424
[4]  
Ellis D.P., 2006, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), V5, P957, DOI DOI 10.1109/ICASSP.2006.1661436
[5]   STATISTICAL-MODEL-BASED SPEECH ENHANCEMENT SYSTEMS [J].
EPHRAIM, Y .
PROCEEDINGS OF THE IEEE, 1992, 80 (10) :1526-1555
[6]   Monaural speech segregation based on pitch tracking and amplitude modulation [J].
Hu, GN ;
Wang, DL .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (05) :1135-1150
[7]   Monaural speech separation based on MAXVQ and CASA for robust speech recognition [J].
Li, Peng ;
Guan, Yong ;
Wang, Shijin ;
Xu, Bo ;
Liu, Wenju .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) :30-44
[8]   SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].
MCAULAY, RJ ;
QUATIERI, TF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754
[9]  
Mowlaee P., J ZHEJIANG IN PRESS
[10]  
Mowlaee P., 2008, EUR SIGN PROC C EUSI