Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation

被引：0

作者：

Kathania, Hemant K. ^{[1
]}

Shahnawazuddin, S. ^{[2
]}

Ahmad, Waquar ^{[1
]}

Adiga, Nagraj ^{[3
]}

Jana, S. K. ^{[1
]}

Samaddar, A. B. ^{[4
]}

机构：

[1] NIT Sikkim, Dept Elect & Commun Engn, Ravangla, India

[2] NIT Patna, Dept Elect & Commun Engn, Patna, Bihar, India

[3] Univ Crete, Dept Comp Sci, Iraklion, Greece

[4] NIT Sikkim, Dept Comp Sci & Engn, Ravangla, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

Children's speech recognition; acoustic mismatch; speaking-rate adaptation; pitch scaling;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the work presented in this paper, we have explored the effect of speaking-rate adaptation on children's speech recognition using acoustic models trained on adults' speech. It is well known that, the shape of the vocal organs, pitch and speaking-rates are significantly different for adult and child speakers. Consequently, the recognition performance for children's speech in such mismatched setup is reported to be extremely poor. To address the acoustic mismatch resulting from the differences in pitch and vocal-tract geometry, a large number of studies have been reported that have presented a myriad of techniques. But, only a few works have studied the role of speaking-rate adaptation on children's speech recognition. Furthermore, those studies were performed on systems employing Gaussian mixture models. Motivated by these facts, we have explored speaking-rate adaptation in the context of systems employing deep neural network based acoustic modeling. Time-scale modification using an approach based on phase-independent iterative spectrogram inversion is employed for speaking-rate adaptation. Significant reductions in errors are noted by adapting the speaking-rates. Furthermore, the effect of combining speaking-rate adaptation with vocal-tract length normalization and pitch scaling is also studied. Additive improvements are obtained by combining the explored techniques with speaking-rate adaptation.

引用

页码：257 / 261

页数：5