Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引：0

作者：

Galic, Jovan ^{[1
]}

Markovic, Branko ^{[2
]}

Grozdic, Dorde ^{[3
,4
]}

Popovic, Branislav ^{[5
]}

Sajic, Slavko ^{[1
]}

机构：

[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg

[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia

[3] Grid Dynamics, Belgrade 11000, Serbia

[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia

[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期

关键词：

artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;

D O I：

10.3390/app14188223

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.

引用

页数：20

共 50 条

[21] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
Nawroly, Sarkhell Sirwan
Popescu, Decebal
Celin, T. A. Mariya
Jeeva, M. P. Actlin
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[22] Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
Sudro, Protima Nomo
Das, Rohan Kumar
Sinha, Rohit
Prasanna, S. R. Mahadeva
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 484 - 490
[23] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
Huan-Yu Dong
Chang-Myung Lee
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[24] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
Dong, Huan-Yu
Lee, Chang-Myung
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[25] Formant estimation of whispered speech based on spectral segmentation
Gong Chenghui
Zhao Heming
Lu Gang
Liu Hanxin
2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 562 - +
[26] CNN-Based Audio Front End Processing on Speech Recognition
Fan, Ruchao
Liu, Gang
2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 349 - 354
[27] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
Pervaiz, Ayesha
Hussain, Fawad
Israr, Huma
Tahir, Muhammad Ali
Raja, Fawad Riasat
Baloch, Naveed Khan
Ishmanov, Farruh
Zikria, Yousaf Bin
SENSORS, 2020, 20 (08)
[28] AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition
Jiang, Yang
Chen, Jun
Han, Kai
Liu, Yi
Ma, Siqi
Song, Yuqing
Liu, Zhe
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 145 - 157
[29] COMPARISON OF PERFORMANCE BETWEEN NORMAL AND WHISPERED SPEECH IN CHINESE ISOLATED WORD RECOGNITION
Sha, Jun
Chen, Xueqin
Yu, Yibiao
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 545 - 548
[30] Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition
Markovic, Branko R.
Galic, Jovan
Mijic, Miomir
ARCHIVES OF ACOUSTICS, 2018, 43 (01) : 3 - 9

← 1 2 3 4 5 →