Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引：18

作者：

Shahnawazuddin, Syed ^{[1
]}

Sinha, Rohit ^{[2
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India

[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India

来源：

IEEE SIGNAL PROCESSING LETTERS | 2017年 / 24卷 / 08期

关键词：

Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;

D O I：

10.1109/LSP.2017.2705085

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.

引用

页码：1128 / 1132

页数：5

共 30 条

[21] Exploring the Role of Pitch-Adaptive Cepstral Features in Context of Children's Mismatched ASR
Sinha, Rohit
Shahnawazuddin, S.
Karthik, Patri Satya
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[22] Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling
Kumar, A.
Aggarwal, R. K.
JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) : 165 - 179
[23] A comparison of the speech recognition and pitch ranking abilities of children using a unilateral cochlear implant, bimodal stimulation or bilateral hearing aids
Looi, Valerie
Radford, Christopher John
INTERNATIONAL JOURNAL OF PEDIATRIC OTORHINOLARYNGOLOGY, 2011, 75 (04) : 472 - 482
[24] Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition
Sun, Lei
Du, Jun
Xie, Zhipeng
Xu, Yong
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 975 - 983
[25] Individual differences in language and working memory affect children's speech recognition in noise
McCreery, Ryan W.
Spratford, Meredith
Kirby, Benjamin
Brennan, Marc
INTERNATIONAL JOURNAL OF AUDIOLOGY, 2017, 56 (05) : 306 - 315
[26] Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition
Sudhakar, Prasad
Ghosh, Prasanta Kumar
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 169 - 173
[27] Children's Recognition of Emotional Prosody in Spectrally Degraded Speech Is Predicted by Their Age and Cognitive Status
Tinnemore, Anna R.
Zion, Danielle J.
Kulkarni, Aditya M.
Chatterjee, Monita
EAR AND HEARING, 2018, 39 (05) : 874 - 880
[28] Sentence Context Facilitation for Children's and Adults' Recognition of Native- and Nonnative-Accented Speech
Bent, Tessa
Holt, Rachael Frush
Miller, Katherine
Libersky, Emma
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2019, 62 (02): : 423 - 433
[29] Mandarin-Speaking Children's Speech Recognition: Developmental Changes in the Influences of Semantic Context and F0 Contours
Zhou, Hong
Li, Yu
Liang, Meng
Guan, Connie Qun
Zhang, Linjun
Shu, Hua
Zhang, Yang
FRONTIERS IN PSYCHOLOGY, 2017, 8
[30] Research Article Effects of Target and Masker Fundamental Frequency Contour Depth on School-Age Children's Speech Recognition in a Two-Talker Masker
Flaherty, Mary M.
Buss, Emily
Libert, Kelsey
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2023, 66 (01): : 400 - 414

← 1 2 3 →