Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引：0

作者：

Galic, Jovan ^{[1
]}

Markovic, Branko ^{[2
]}

Grozdic, Dorde ^{[3
,4
]}

Popovic, Branislav ^{[5
]}

Sajic, Slavko ^{[1
]}

机构：

[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg

[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia

[3] Grid Dynamics, Belgrade 11000, Serbia

[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia

[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期

关键词：

artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;

D O I：

10.3390/app14188223

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.

引用

页数：20

共 50 条

[31] A Primary Research on Gabor Tensor Sparse Features Representation for Whispered Speech Recognition
Chen, X. Q.
Zhao, H. M.
Yu, Y. B.
Wu, H. W.
Liu, Z.
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, AUTOMATION AND MECHANICAL ENGINEERING (EAME 2015), 2015, 13 : 346 - 348
[32] Deep Audio-Visual Speech Recognition
Afouras, Triantafyllos
Chung, Joon Son
Senior, Andrew
Vinyals, Oriol
Zisserman, Andrew
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
[33] Reconstruction of Normal Speech from Whispered Speech based on RBF Neural Network
Tao, Zhi
Tan, Xue-Dan
Han, Tao
Gu, Ji-Hua
Xu, Yi-Shen
Zhao, He-Ming
2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 374 - 377
[34] Audiogmenter: a MATLAB toolbox for audio data augmentation
Maguolo, Gianluca
Paci, Michelangelo
Nanni, Loris
Bonan, Ludovico
APPLIED COMPUTING AND INFORMATICS, 2025, 21 (1/2) : 152 - 163
[35] Real-time speech emotion recognition using deep learning and data augmentation
Barhoumi, Chawki
Benayed, Yassine
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
[36] An automatic speech recognition system in Odia language using attention mechanism and data augmentation
Malay Kumar Majhi
Sujan Kumar Saha
International Journal of Speech Technology, 2024, 27 (3) : 717 - 728
[37] Speech Enhancement Algorithm Based on MMSE Short Time Spectral Amplitude in Whispered Speech
Zhi-Heng Lu
Journal of Electronic Science and Technology, 2009, 7 (02) : 115 - 118
[38] Application of Inverse Filtering in Enhancement of Whisper Recognition
Grozdic, Dorde T.
Jovicic, Slobodan T.
Galic, Jovan
Markovic, Branko
2014 12TH SYMPOSIUM ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING (NEUREL), 2014, : 157 - 161
[39] Inverse Filtering of Nasalized Vowels Using Synthesized Speech
Gobl, Christer
Mahshie, James
JOURNAL OF VOICE, 2013, 27 (02) : 155 - 169
[40] DATA-FILTERING METHODS FOR SELF-TRAINING OF AUTOMATIC SPEECH RECOGNITION SYSTEMS
Georgescu, Alexandru-Lucian
Manolache, Cristian
Oneata, Dan
Cucu, Horia
Burileanu, Corneliu
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 141 - 147

← 1 2 3 4 5 →