Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引：0

作者：

Galic, Jovan ^{[1
]}

Markovic, Branko ^{[2
]}

Grozdic, Dorde ^{[3
,4
]}

Popovic, Branislav ^{[5
]}

Sajic, Slavko ^{[1
]}

机构：

[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg

[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia

[3] Grid Dynamics, Belgrade 11000, Serbia

[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia

[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期

关键词：

artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;

D O I：

10.3390/app14188223

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.

引用

页数：20

共 50 条

[1] Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
Grozdic, Dorde T.
Jovicic, Slobodan T.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2313 - 2322
[2] Analysis and recognition of whispered speech
Ito, T
Takeda, K
Itakura, F
SPEECH COMMUNICATION, 2005, 45 (02) : 139 - 152
[3] Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
Galic, Jovan
Grozdic, Dorde
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2023, 23 (03) : 3 - 12
[4] Enhancing Automatic Speech Recognition: Effects of Semantic Audio Filtering on Models Performance
Perezhohin, Yuriy
Santos, Tiago
Costa, Victor
Peres, Fernando
Castelli, Mauro
IEEE ACCESS, 2024, 12 : 155136 - 155150
[5] Study on the Emotion Recognition of Whispered Speech
Jin, Yun
Zhao, Yan
Huang, Chengwei
Zhao, Li
PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 242 - 246
[6] Comparison of Cepstral Normalization Techniques in Whispered Speech Recognition
Grozdic, Dorde
Jovicic, Slobodan
Sumarac Pavlovic, Dragana
Galic, Jovan
Markovic, Branko
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2017, 17 (01) : 21 - 26
[7] Whispered Speech Recognition using Hidden Markov Models and Support Vector Machines
Galic, Jovan
Popovic, Branislav
Pavlovic, Dragana Sumarac
ACTA POLYTECHNICA HUNGARICA, 2018, 15 (05) : 11 - 29
[8] Speaker Identification Within Whispered Speech Audio Streams
Fan, Xing
Hansen, John H. L.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1408 - 1421
[9] The Recognition of Whispered Speech in Real-Time
Hendrickson, Kristi
Ernest, Danielle
EAR AND HEARING, 2022, 43 (02) : 554 - 562
[10] A STUDY ON ROBUSTNESS OF ARTICULATORY FEATURES FOR AUTOMATIC SPEECH RECOGNITION OF NEUTRAL AND WHISPERED SPEECH
Srinivasan, Gokul
Illa, Aravind
Ghosh, Prasanta Kumar
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5936 - 5940

← 1 2 3 4 5 →