Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification

被引：0

作者：

Shahnawazuddin, S. ^{[1
]}

Kathania, Hemant K. ^{[2
]}

Adiga, Nagaraj ^{[3
]}

Sai, B. Tarun ^{[1
]}

Ahmad, Waquar ^{[4
]}

机构：

[1] NIT Patna, Dept ECE, Patna, Bihar, India

[2] NIT Sikkim, Dept ECE, South Sikkim, India

[3] Univ Crete, Dept CS, Iraklion, Greece

[4] NIT Calicut, Dept ECE, Kozhikode, India

来源：

2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2019年

关键词：

Speaking-rate adaptation; automatic speech recognition; time-scale modification; fuzzy classification; SIGNALS;

D O I：

10.1109/ncc.2019.8732255

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In this paper, we study the role of speaking-rate adaptation (SRA) of automatic speech recognition (ASR) systems. The performance of an ASR system is reported to degrade when the speaking-rate is either too fast or too slow. In order to simulate such a situation, an ASR system was trained on adults' speech and used for transcribing speech data from adult as well as child speakers. Earlier studies have shown that, speaking-rate is significantly lower in the case of children when compared to adults. Consequently, the recognition performance for children's speech was noted to be very poor in contrast to adults' speech. To improve the recognition performance with respect to children's speech, speaking-rate was explicitly changed using time-scale modification (TSM). A recently proposed TSM approach based on fuzzy classification of spectral bins has been explored in this regard. The fuzzy-classification-based TSM technique is reported to be superior to state-of-the-art approaches. Effectiveness of the said TSM technique has not been studied yet in the context of ASR. The experimental studies presented in this paper show that SRA based on fuzzy classification results in a relative improvement of 30% over the baseline.

引用

页数：5

共 29 条

[21] Performance enhancement of syllable based Tamil speech recognition system using time normalization and rate of speech
A. Akila
E. Chandra
CSI Transactions on ICT, 2014, 2 (2) : 77 - 84
[22] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
Gouda, Ahmed Mostafa
Tamazin, Mohamed
Khedr, Mohamed
PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186
[23] A Fuzzy Ontology driven Context Classification System using Large-Scale Image Recognition based on Deep CNN
Edris, Saba S.
Zarka, Mohamed
Ouarda, Wael
Alimi, Adel M.
PROCEEDINGS OF 2017 SUDAN CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (SCCSIT), 2017, : 43 - 51
[24] Developing speaker independent ASR system using limited data through prosody modification based on fuzzy classification of spectral bins
Shahnawazuddin, S.
Adiga, Nagaraj
Sai, B. Tarun
Ahmad, Waquar
Kathania, Hemant K.
DIGITAL SIGNAL PROCESSING, 2019, 93 : 34 - 42
[25] Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm
Saraswathi, S.
Geetha, T. V.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2006, 9 (3-4) : 151 - 163
[26] Automatic Multi-Speaker Speech Recognition System Based on Time-Frequency Blind Source Separation under Ubiquitous Environment
Wang, Zhe
Zhang, Haijian
Bi, Guoan
Li, Xiumei
PROCEEDINGS OF THE 2014 9TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2014, : 101 - +
[27] Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR
Adeleh Asemi
Siti Salwah Binti Salim
Seyed Reza Shahamiri
Asefeh Asemi
Narjes Houshangi
Soft Computing, 2019, 23 : 3529 - 3544
[28] Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR
Asemi, Adeleh
Salim, Siti Salwah Binti
Shahamiri, Seyed Reza
Asemi, Asefeh
Houshangi, Narjes
SOFT COMPUTING, 2019, 23 (10) : 3529 - 3544
[29] Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance
Polat, Hueseyin
Turan, Alp Kaan
Kocak, Cemal
Ulas, Hasan Basri
ELECTRONICS, 2024, 13 (21)

← 1 2 3 →