Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] A Primary Research on Gabor Tensor Sparse Features Representation for Whispered Speech Recognition
    Chen, X. Q.
    Zhao, H. M.
    Yu, Y. B.
    Wu, H. W.
    Liu, Z.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, AUTOMATION AND MECHANICAL ENGINEERING (EAME 2015), 2015, 13 : 346 - 348
  • [32] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [33] Reconstruction of Normal Speech from Whispered Speech based on RBF Neural Network
    Tao, Zhi
    Tan, Xue-Dan
    Han, Tao
    Gu, Ji-Hua
    Xu, Yi-Shen
    Zhao, He-Ming
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 374 - 377
  • [34] Audiogmenter: a MATLAB toolbox for audio data augmentation
    Maguolo, Gianluca
    Paci, Michelangelo
    Nanni, Loris
    Bonan, Ludovico
    APPLIED COMPUTING AND INFORMATICS, 2025, 21 (1/2) : 152 - 163
  • [35] Real-time speech emotion recognition using deep learning and data augmentation
    Barhoumi, Chawki
    Benayed, Yassine
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
  • [36] An automatic speech recognition system in Odia language using attention mechanism and data augmentation
    Malay Kumar Majhi
    Sujan Kumar Saha
    International Journal of Speech Technology, 2024, 27 (3) : 717 - 728
  • [38] Application of Inverse Filtering in Enhancement of Whisper Recognition
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    Galic, Jovan
    Markovic, Branko
    2014 12TH SYMPOSIUM ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING (NEUREL), 2014, : 157 - 161
  • [39] Inverse Filtering of Nasalized Vowels Using Synthesized Speech
    Gobl, Christer
    Mahshie, James
    JOURNAL OF VOICE, 2013, 27 (02) : 155 - 169
  • [40] DATA-FILTERING METHODS FOR SELF-TRAINING OF AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Georgescu, Alexandru-Lucian
    Manolache, Cristian
    Oneata, Dan
    Cucu, Horia
    Burileanu, Corneliu
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 141 - 147