Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification

被引：0

作者：

Shahnawazuddin, S. ^{[1
]}

Kathania, Hemant K. ^{[2
]}

Adiga, Nagaraj ^{[3
]}

Sai, B. Tarun ^{[1
]}

Ahmad, Waquar ^{[4
]}

机构：

[1] NIT Patna, Dept ECE, Patna, Bihar, India

[2] NIT Sikkim, Dept ECE, South Sikkim, India

[3] Univ Crete, Dept CS, Iraklion, Greece

[4] NIT Calicut, Dept ECE, Kozhikode, India

来源：

2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2019年

关键词：

Speaking-rate adaptation; automatic speech recognition; time-scale modification; fuzzy classification; SIGNALS;

D O I：

10.1109/ncc.2019.8732255

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In this paper, we study the role of speaking-rate adaptation (SRA) of automatic speech recognition (ASR) systems. The performance of an ASR system is reported to degrade when the speaking-rate is either too fast or too slow. In order to simulate such a situation, an ASR system was trained on adults' speech and used for transcribing speech data from adult as well as child speakers. Earlier studies have shown that, speaking-rate is significantly lower in the case of children when compared to adults. Consequently, the recognition performance for children's speech was noted to be very poor in contrast to adults' speech. To improve the recognition performance with respect to children's speech, speaking-rate was explicitly changed using time-scale modification (TSM). A recently proposed TSM approach based on fuzzy classification of spectral bins has been explored in this regard. The fuzzy-classification-based TSM technique is reported to be superior to state-of-the-art approaches. Effectiveness of the said TSM technique has not been studied yet in the context of ASR. The experimental studies presented in this paper show that SRA based on fuzzy classification results in a relative improvement of 30% over the baseline.

引用

页数：5

共 29 条

[1] Measure of local speaking-rate for automatic speech recognition
Russell, MJ
Ponting, KM
Tomlinson, MJ
ELECTRONICS LETTERS, 1999, 35 (10) : 787 - 789
[2] Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation
Kathania, Hemant K.
Shahnawazuddin, S.
Ahmad, Waquar
Adiga, Nagraj
Jana, S. K.
Samaddar, A. B.
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 257 - 261
[3] Speaking rate control based on time-scale modification and its effects on the performance of speech recognition
Kang, Jin Ah
Choi, Seung Ho
INTERNATIONAL JOURNAL OF ENGINEERING SYSTEMS MODELLING AND SIMULATION, 2014, 6 (1-2) : 31 - 36
[4] Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition
Shahnawazuddin, S.
Kathania, Hemant K.
Singh, Chaman
Ahmad, Waquar
Pradhan, Gayadhar
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 21 - 25
[5] Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition
Nanjo, H
Kawahara, T
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 725 - 728
[6] Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition
Shahnawazuddin, S.
Adiga, Nagaraj
Kathania, Hemant K.
Pradhan, Gaydhar
Sinha, Rohit
DIGITAL SIGNAL PROCESSING, 2018, 79 : 142 - 151
[7] Wavelet speech enhancement based on time-scale adaptation
Bahoura, Mohammed
Rouat, Jean
SPEECH COMMUNICATION, 2006, 48 (12) : 1620 - 1637
[8] Approach for time-scale modification of speech based on TCNMF
Wu, Haijia
Zhang, Xiongwei
Huang, Jianjun
Chen, Weiwei
ELECTRONICS LETTERS, 2013, 49 (01) : 71 - 72
[9] EFFECT OF TIME-SCALE MODIFICATION OF SPEECH ON THE SPEECH RECOGNITION THRESHOLD IN NOISE FOR ELDERLY LISTENERS
STOLLMAN, MHP
KAPTEYN, TS
AUDIOLOGY, 1994, 33 (05): : 280 - 290
[10] Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
Watanabe, Shinji
Nakamura, Atsushi
Juang, Biing-Hwang
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1088 - +

← 1 2 3 →