A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation

被引:214
作者
Hu, Guoning [1 ,2 ]
Wang, DeLiang [3 ,4 ]
机构
[1] Ohio State Univ, Biophys Program, Columbus, OH 43210 USA
[2] AOL Truveo Video Search, San Francisco, CA 94104 USA
[3] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[4] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
基金
美国国家科学基金会;
关键词
Computational auditory scene analysis (CASA); iterative procedure; pitch estimation; speech segregation; tandem algorithm; INSTANTANEOUS FREQUENCY; UNVOICED SPEECH; MONAURAL SPEECH; NOISY; SEPARATION; TRACKING; SINGLE; SIGNAL;
D O I
10.1109/TASL.2010.2041110
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A lot of effort has been made in computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. The performance of current CASA systems on voiced speech segregation is limited by lacking a robust algorithm for pitch estimation. We propose a tandem algorithm that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively. This algorithm first obtains a rough estimate of target pitch, and then uses this estimate to segregate target speech using harmonicity and temporal continuity. It then improves both pitch estimation and voiced speech segregation iteratively. Novel methods are proposed for performing segregation with a given pitch estimate and pitch determination with given segregation. Systematic evaluation shows that the tandem algorithm extracts a majority of target speech without including much interference, and it performs substantially better than previous systems for either pitch extraction or voiced speech segregation.
引用
收藏
页码:2067 / 2079
页数:13
相关论文
共 40 条
[1]  
[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[2]  
[Anonymous], 1990, Neurocomputing: Algorithms, architectures and applications
[3]  
[Anonymous], 2006, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
[4]  
[Anonymous], 2001, HEARING AIDS
[5]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[6]  
BAGSHAW PC, 1993, P EUR C SPEECH COMM, P1003
[7]   Decoding speech in the presence of other sources [J].
Barker, JP ;
Cooke, MP ;
Ellis, DPW .
SPEECH COMMUNICATION, 2005, 45 (01) :5-25
[8]   ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .2. ALGORITHMS AND APPLICATIONS [J].
BOASHASH, B .
PROCEEDINGS OF THE IEEE, 1992, 80 (04) :540-568
[9]   ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .1. FUNDAMENTALS [J].
BOASHASH, B .
PROCEEDINGS OF THE IEEE, 1992, 80 (04) :520-538
[10]  
Boersma P., 2004, PRAAT DOING PHONETIC