共 1 条
Gammatone-Filterbank Based Pitch-Normalized Cepstral Coefficients for Zero-Resource Children's ASR
被引:1
|作者:
Shahnawazuddin, Syed
[1
]
Ankita
[1
]
Kumar, Avinash
[2
]
Kathania, Hemant Kumar
[2
]
机构:
[1] Natl Inst Technol Patna, Patna, Bihar, India
[2] Natl Inst Technol Sikkim, Ravangla, India
来源:
SPEECH AND COMPUTER, SPECOM 2023, PT I
|
2023年
/
14338卷
关键词:
Children's ASR;
Zero-resource ASR;
Spectral smoothing;
Gamma-tone-filterbank;
VMD;
SPEECH;
RECOGNITION;
D O I:
10.1007/978-3-031-48309-7_40
中图分类号:
O42 [声学];
学科分类号:
070206 ;
082403 ;
摘要:
The work presented in this paper focuses on zero-resource children's speech recognition task. In such tasks, adults' speech data is used for learning the acoustic models. However, this leads to severe acoustic mismatch and hence poor recognition rates. One of the main mismatch factor is that the pitch values are higher in the case of children's speech. In order to mitigate the ill-effects of pitch-induced acoustic mismatch, two front-end speech parameterization techniques are proposed in this study. The proposed approaches employ spectral smoothing based on either pitch-adaptive cepstral truncation or variational mode decomposition. Furthermore, we have used Gamma-tone-filterbank for warping the spectra to the ERB scale. Consequently, the cepstral coefficients exhibit lower variance than those obtained using Mel-filterbank. Therefore, the proposed features are observed to be very effective resulting in a relative reduction in word error rate by nearly 17% over the baseline.
引用
收藏
页码:494 / 505
页数:12
相关论文