Pitch-Adaptive Front-end Features for Robust Children's ASR

被引:43
作者
Shahnawazuddin, S. [1 ]
Dey, Abhishek [1 ]
Sinha, Rohit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Children's speech recognition; pitch-adaptive features; DNN; DEEP NEURAL-NETWORKS; SPEECH; RECOGNITION; REPRESENTATIONS;
D O I
10.21437/Interspeech.2016-1020
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the presented work, we explore some of the challenges in recognizing children's speech on automatic speech recognition (ASR) systems developed using adults' speech. In such mismatched ASR tasks, a severely degraded recognition performance is observed due to the gross mismatch in the acoustic attributes between those two groups of speakers. Among the various sources of mismatch, we focus on the large differences in the average pitch values across the adult and child speakers in this work. Earlier studies have shown that the Mel-filterbank employed in the feature extraction is not able to smooth out the pitch harmonics sufficiently in particularly for the high-pitched child speakers. As a result of that, the acoustic features derived for the adult and the child speakers turn out to be significantly mismatched. For addressing this problem, we propose a simple technique based on adaptive-liftering for deriving the pitch-robust features. This enables us to reduce the sensitivity of the acoustic features to the gross variations in pitch across the speakers. The proposed features are found to result in improved performance in the context of deep neural network based ASR system. Further with the use of the existing feature normalization techniques, additional gains are noted.
引用
收藏
页码:3459 / 3463
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2007, P SPEECH LANG TECHN
[2]  
[Anonymous], 1995, SPEECH CODING SYNTHE
[3]  
[Anonymous], 2011, IEEE 2011 WORKSHOP
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 2013, P INTERSPEECH
[6]  
[Anonymous], 2014, P WORKSH CHILD COMP
[7]  
[Anonymous], 2014, P WOCCI
[8]  
[Anonymous], 2000, P 6 INT C SPOK LANG
[9]  
[Anonymous], 2005, P INTERSPEECH
[10]  
Brookes M., 2005, VOICEBOX: Speech Processing Toolbox for MATLAB