Exploring the Effect of Differences in the Acoustic Correlates of Adults' and Children's Speech in the Context of Automatic Speech Recognition

被引:0
作者
Shweta Ghai
Rohit Sinha
机构
[1] Indian Institute of Technology Guwahati,Department of Electronics and Communication Engineering
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2010卷
关键词
Speech Signal; Automatic Speech Recognition; Speech Data; Speaking Rate; Mismatched Condition;
D O I
暂无
中图分类号
学科分类号
摘要
This work explores the effect of mismatches between adults' and children's speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. The different correlates studied in this work include the pitch, the speaking rate, the glottal parameters (open quotient, return quotient, and speech quotient), and the formant frequencies. An effort is made to quantify the effect of these correlates by explicitly normalizing each of them using the already existing techniques available in literature. Our initial study done on a connected digit recognition task shows that among these parameters only the formant frequencies, the pitch, and the speaking rate affect the automatic speech recognition performance. Significant improvements are obtained in the performance with normalization of these three parameters. With combined normalization of the pitch, the speaking rate, and the formant frequencies, 80% and 70% relative improvements are obtained over the baseline for children's speech and adults' speech recognition under mismatched conditions.
引用
收藏
相关论文
共 46 条
  • [1] Russell M(1996)Applications of automatic speech recognition to speech and language development in young children Proceedings of the International Conference on Spoken Language Processing (ICSLP '96) 1 176-179
  • [2] Brown C(2000)The STAR system: an interactive pronunciation tutor for young children Computer Speech and Language 14 161-175
  • [3] Skilling A(2002)Creating conversational interfaces for children IEEE Transactions on Speech and Audio Processing 10 65-78
  • [4] Russell M(1999)Acoustics of children's speech: developmental changes of temporal and spectral parameters Journal of the Acoustical Society of America 105 1455-1468
  • [5] Series RW(2003)Investigating recognition of children's speech Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03) 2 137-140
  • [6] Wallace JL(2007)Acoustic variability and automatic recognition of children's speech Speech Communication 49 847-860
  • [7] Brown C(2003)Robust recognition of children's speech IEEE Transactions on Speech and Audio Processing 11 603-616
  • [8] Skilling A(1990)Analysis, synthesis, and perception of voice quality variations among female and male talkers Journal of the Acoustical Society of America 87 820-856
  • [9] Narayanan S(2005)Aerodynamic measurements: normative data for children ages 6:0 to 10:11 Years Journal of Voice 19 326-339
  • [10] Potamianos A(1996)Speaker normalization using efficient frequency warping procedures Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96) 1 353-356