Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation

被引：0

作者：

Kathania, Hemant K. ^{[1
]}

Shahnawazuddin, S. ^{[2
]}

Ahmad, Waquar ^{[1
]}

Adiga, Nagraj ^{[3
]}

Jana, S. K. ^{[1
]}

Samaddar, A. B. ^{[4
]}

机构：

[1] NIT Sikkim, Dept Elect & Commun Engn, Ravangla, India

[2] NIT Patna, Dept Elect & Commun Engn, Patna, Bihar, India

[3] Univ Crete, Dept Comp Sci, Iraklion, Greece

[4] NIT Sikkim, Dept Comp Sci & Engn, Ravangla, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

Children's speech recognition; acoustic mismatch; speaking-rate adaptation; pitch scaling;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the work presented in this paper, we have explored the effect of speaking-rate adaptation on children's speech recognition using acoustic models trained on adults' speech. It is well known that, the shape of the vocal organs, pitch and speaking-rates are significantly different for adult and child speakers. Consequently, the recognition performance for children's speech in such mismatched setup is reported to be extremely poor. To address the acoustic mismatch resulting from the differences in pitch and vocal-tract geometry, a large number of studies have been reported that have presented a myriad of techniques. But, only a few works have studied the role of speaking-rate adaptation on children's speech recognition. Furthermore, those studies were performed on systems employing Gaussian mixture models. Motivated by these facts, we have explored speaking-rate adaptation in the context of systems employing deep neural network based acoustic modeling. Time-scale modification using an approach based on phase-independent iterative spectrogram inversion is employed for speaking-rate adaptation. Significant reductions in errors are noted by adapting the speaking-rates. Furthermore, the effect of combining speaking-rate adaptation with vocal-tract length normalization and pitch scaling is also studied. Additive improvements are obtained by combining the explored techniques with speaking-rate adaptation.

引用

页码：257 / 261

页数：5

共 50 条

[1] Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification
Shahnawazuddin, S.
Kathania, Hemant K.
Adiga, Nagaraj
Sai, B. Tarun
Ahmad, Waquar
2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
[2] Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition
Shahnawazuddin, S.
Kathania, Hemant K.
Singh, Chaman
Ahmad, Waquar
Pradhan, Gayadhar
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 21 - 25
[3] Speaking rate control based on time-scale modification and its effects on the performance of speech recognition
Kang, Jin Ah
Choi, Seung Ho
INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2014, 6 (1-2) : 31 - 36
[4] Language model and speaking rate adaptation for spontaneous presentation speech recognition
Nanjo, H
Kawahara, T
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 391 - 400
[5] Improving Speech Recognition Rate through Analysis Parameters
Eringis, Deividas
Tamulevicius, Gintautas
ELECTRICAL CONTROL AND COMMUNICATION ENGINEERING, 2014, 5 (01) : 61 - 66
[6] Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition
Nanjo, H
Kawahara, T
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 725 - 728
[7] Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
Watanabe, Shinji
Nakamura, Atsushi
Juang, Biing-Hwang
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1088 - +
[8] Improving Children's Speech Recognition through Explicit Pitch Scaling based on Iterative Spectrogram Inversion
Ahmad, W.
Shahnawazuddin, S.
Kathania, H. K.
Pradhan, G.
Samaddar, A. B.
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2391 - 2395
[9] Improving the performance of keyword spotting system for children's speech through prosody modification
Shahnawazuddin, S.
Maity, Karabi
Pradhan, Gayadhar
DIGITAL SIGNAL PROCESSING, 2019, 86 : 11 - 18
[10] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
Fainberg, Joachim
Bell, Peter
Lincoln, Mike
Renals, Steve
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602

← 1 2 3 4 5 →