Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation

被引:0
|
作者
Kathania, Hemant K. [1 ]
Shahnawazuddin, S. [2 ]
Ahmad, Waquar [1 ]
Adiga, Nagraj [3 ]
Jana, S. K. [1 ]
Samaddar, A. B. [4 ]
机构
[1] NIT Sikkim, Dept Elect & Commun Engn, Ravangla, India
[2] NIT Patna, Dept Elect & Commun Engn, Patna, Bihar, India
[3] Univ Crete, Dept Comp Sci, Iraklion, Greece
[4] NIT Sikkim, Dept Comp Sci & Engn, Ravangla, India
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年
关键词
Children's speech recognition; acoustic mismatch; speaking-rate adaptation; pitch scaling;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the work presented in this paper, we have explored the effect of speaking-rate adaptation on children's speech recognition using acoustic models trained on adults' speech. It is well known that, the shape of the vocal organs, pitch and speaking-rates are significantly different for adult and child speakers. Consequently, the recognition performance for children's speech in such mismatched setup is reported to be extremely poor. To address the acoustic mismatch resulting from the differences in pitch and vocal-tract geometry, a large number of studies have been reported that have presented a myriad of techniques. But, only a few works have studied the role of speaking-rate adaptation on children's speech recognition. Furthermore, those studies were performed on systems employing Gaussian mixture models. Motivated by these facts, we have explored speaking-rate adaptation in the context of systems employing deep neural network based acoustic modeling. Time-scale modification using an approach based on phase-independent iterative spectrogram inversion is employed for speaking-rate adaptation. Significant reductions in errors are noted by adapting the speaking-rates. Furthermore, the effect of combining speaking-rate adaptation with vocal-tract length normalization and pitch scaling is also studied. Additive improvements are obtained by combining the explored techniques with speaking-rate adaptation.
引用
收藏
页码:257 / 261
页数:5
相关论文
共 50 条
  • [1] Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification
    Shahnawazuddin, S.
    Kathania, Hemant K.
    Adiga, Nagaraj
    Sai, B. Tarun
    Ahmad, Waquar
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [2] Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition
    Shahnawazuddin, S.
    Kathania, Hemant K.
    Singh, Chaman
    Ahmad, Waquar
    Pradhan, Gayadhar
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 21 - 25
  • [3] Speaking rate control based on time-scale modification and its effects on the performance of speech recognition
    Kang, Jin Ah
    Choi, Seung Ho
    INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2014, 6 (1-2) : 31 - 36
  • [4] Language model and speaking rate adaptation for spontaneous presentation speech recognition
    Nanjo, H
    Kawahara, T
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 391 - 400
  • [5] Improving Speech Recognition Rate through Analysis Parameters
    Eringis, Deividas
    Tamulevicius, Gintautas
    ELECTRICAL CONTROL AND COMMUNICATION ENGINEERING, 2014, 5 (01) : 61 - 66
  • [6] Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition
    Nanjo, H
    Kawahara, T
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 725 - 728
  • [7] Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
    Watanabe, Shinji
    Nakamura, Atsushi
    Juang, Biing-Hwang
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1088 - +
  • [8] Improving Children's Speech Recognition through Explicit Pitch Scaling based on Iterative Spectrogram Inversion
    Ahmad, W.
    Shahnawazuddin, S.
    Kathania, H. K.
    Pradhan, G.
    Samaddar, A. B.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2391 - 2395
  • [9] Improving the performance of keyword spotting system for children's speech through prosody modification
    Shahnawazuddin, S.
    Maity, Karabi
    Pradhan, Gayadhar
    DIGITAL SIGNAL PROCESSING, 2019, 86 : 11 - 18
  • [10] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
    Fainberg, Joachim
    Bell, Peter
    Lincoln, Mike
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602