FEATURE EXTRACTION FOR ROBUST SPEECH RECOGNITION BASED ON MAXIMIZING THE SHARPNESS OF THE POWER DISTRIBUTION AND ON POWER FLOORING

被引:75
作者
Kim, Chanwoo [1 ]
Stern, Richard M. [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Robust speech recognition; physiological modeling; sharpness of power distribution; power flooring; auditory threshold;
D O I
10.1109/ICASSP.2010.5495570
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new robust feature extraction algorithm based on a modified approach to power bias subtraction combined with applying a threshold to the power spectral density. Power bias level is selected as a level above which the signal power distribution is sharpest. The sharpness is measured using the ratio of arithmetic mean to the geometric mean of medium-duration power. When subtracting this bias level, power flooring is applied to enhance robustness. These new ideas are employed to enhance our recently introduced feature extraction algorithm PNCC (Power Normalized Cepstral Coefficient). While simpler than our previous PNCC, experimental results show that this new PNCC is showing better performance than our previous implementation.
引用
收藏
页码:4574 / 4577
页数:4
相关论文
共 12 条
[1]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[2]  
Kim C., 2006, INTERSPEECH 2006, P1975
[3]   Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition [J].
Kim, Chanwoo ;
Stern, Richard M. .
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :188-+
[4]  
Kim C, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P2598
[5]  
Kim C, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P2479
[6]  
Kim C, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P28
[7]  
Moreno P. J., 1996, P IEEE INT C AC SPEE
[8]  
Patterson R. D., 1992, AUDITORY PHYSL PERCE, P429, DOI DOI 10.1016/B978-0-08-041847-6.50054-X
[9]   Missing-feature approaches in speech recognition [J].
Raj, B ;
Stern, RM .
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) :101-116
[10]  
Raj B, 1997, INT CONF ACOUST SPEE, P851, DOI 10.1109/ICASSP.1997.596069