Data Augmentation using Healthy Speech for Dysarthric Speech Recognition

被引：0

作者：

Vachhani, Bhavik ^{[1
]}

Bhat, Chitralekha ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] TCS Res & Innovat, Mumbai, Maharashtra, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Dysarthric speech recognition; Data augmentation; Dysarthria severity;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dysarthria refers to a speech disorder caused by trauma to the brain areas concerned with motor aspects of speech giving rise to effortful, slow, slurred or prosodically abnormal speech. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks, owing mostly to insufficient dysarthric speech data. Speaker related challenges complicates data collection process for dysarthric speech. In this paper, we explore data augmentation using temporal and speed modifications to healthy speech to simulate dysarthric speech. DNN-HMM based Automatic Speech Recognition (ASR) and Random Forest based classification were used for evaluation of the proposed method. Dysarthric speech, generated synthetically, is classified for severity level using a Random Forest classifier that is trained on actual dysarthric speech. ASR trained on healthy speech, augmented with simulated dysarthric speech is evaluated for dysarthric speech recognition. All evaluations were carried out using Universal Access dysarthric speech corpus. An absolute improvement of 4.24% and 2% WAS achieved using tempo based and speed based data augmentation respectively as compared to ASR performance using healthy speech alone for training.

引用

页码：471 / 475

页数：5

共 50 条

[1] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
Nawroly, Sarkhell Sirwan
Popescu, Decebal
Celin, T. A. Mariya
Jeeva, M. P. Actlin
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[2] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Jin, Zengrui
Geng, Mengzhe
Deng, Jiajun
Wang, Tianzi
Hu, Shujie
Li, Guinan
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429
[3] Few-shot dysarthric speech recognition with text-to-speech data augmentation
Hermann, Enno
Magimai-Doss, Mathew
INTERSPEECH 2023, 2023, : 156 - 160
[4] SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition
Nawroly, Sarkhell Sirwan
Popescu, Decebal Gheorghe
Antony, Mariya Celin Thekekara
Philominal, Actlin Jeeva Muthu
STUDIES IN INFORMATICS AND CONTROL, 2023, 32 (04): : 129 - 140
[5] Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition
Gracelli, Ricardo
Almeida, Jurandy
2024 IEEE 37TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS 2024, 2024, : 243 - 248
[6] SIMULATING DYSARTHRIC SPEECH FOR TRAINING DATA AUGMENTATION IN CLINICAL SPEECH APPLICATIONS
Jiao, Yishan
Tu, Ming
Berisha, Visar
Liss, Julie
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6009 - 6013
[7] Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Baali, Massa
Almakky, Ibrahim
Shehata, Shady
Karray, Fakhri
INTERSPEECH 2023, 2023, : 1558 - 1562
[8] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
Soleymanpour, Mohammad
Johnson, Michael T.
Soleymanpour, Rahim
Berry, Jeffrey
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
[9] Using speech rhythm knowledge to improve dysarthric speech recognition
Selouani, S. -A.
Dahmani, H.
Amami, R.
Hamam, H.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 57 - 64
[10] Accurate synthesis of dysarthric Speech for ASR data augmentation
Soleymanpour, Mohammad
Johnson, Michael T.
Soleymanpour, Rahim
Berry, Jeffrey
SPEECH COMMUNICATION, 2024, 164

← 1 2 3 4 5 →