Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引:18
|
作者
Shahnawazuddin, Syed [1 ]
Sinha, Rohit [2 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India
[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;
D O I
10.1109/LSP.2017.2705085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.
引用
收藏
页码:1128 / 1132
页数:5
相关论文
共 30 条
  • [21] Exploring the Role of Pitch-Adaptive Cepstral Features in Context of Children's Mismatched ASR
    Sinha, Rohit
    Shahnawazuddin, S.
    Karthik, Patri Satya
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
  • [23] A comparison of the speech recognition and pitch ranking abilities of children using a unilateral cochlear implant, bimodal stimulation or bilateral hearing aids
    Looi, Valerie
    Radford, Christopher John
    INTERNATIONAL JOURNAL OF PEDIATRIC OTORHINOLARYNGOLOGY, 2011, 75 (04) : 472 - 482
  • [24] Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition
    Sun, Lei
    Du, Jun
    Xie, Zhipeng
    Xu, Yong
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 975 - 983
  • [25] Individual differences in language and working memory affect children's speech recognition in noise
    McCreery, Ryan W.
    Spratford, Meredith
    Kirby, Benjamin
    Brennan, Marc
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2017, 56 (05) : 306 - 315
  • [26] Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition
    Sudhakar, Prasad
    Ghosh, Prasanta Kumar
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 169 - 173
  • [27] Children's Recognition of Emotional Prosody in Spectrally Degraded Speech Is Predicted by Their Age and Cognitive Status
    Tinnemore, Anna R.
    Zion, Danielle J.
    Kulkarni, Aditya M.
    Chatterjee, Monita
    EAR AND HEARING, 2018, 39 (05) : 874 - 880
  • [28] Sentence Context Facilitation for Children's and Adults' Recognition of Native- and Nonnative-Accented Speech
    Bent, Tessa
    Holt, Rachael Frush
    Miller, Katherine
    Libersky, Emma
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2019, 62 (02): : 423 - 433
  • [29] Mandarin-Speaking Children's Speech Recognition: Developmental Changes in the Influences of Semantic Context and F0 Contours
    Zhou, Hong
    Li, Yu
    Liang, Meng
    Guan, Connie Qun
    Zhang, Linjun
    Shu, Hua
    Zhang, Yang
    FRONTIERS IN PSYCHOLOGY, 2017, 8
  • [30] Research Article Effects of Target and Masker Fundamental Frequency Contour Depth on School-Age Children's Speech Recognition in a Two-Talker Masker
    Flaherty, Mary M.
    Buss, Emily
    Libert, Kelsey
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2023, 66 (01): : 400 - 414