Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal
    Celin, T. A. Mariya
    Jeeva, M. P. Actlin
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [22] Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
    Sudro, Protima Nomo
    Das, Rohan Kumar
    Sinha, Rohit
    Prasanna, S. R. Mahadeva
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 484 - 490
  • [23] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
    Huan-Yu Dong
    Chang-Myung Lee
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [24] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
    Dong, Huan-Yu
    Lee, Chang-Myung
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [25] Formant estimation of whispered speech based on spectral segmentation
    Gong Chenghui
    Zhao Heming
    Lu Gang
    Liu Hanxin
    2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 562 - +
  • [26] CNN-Based Audio Front End Processing on Speech Recognition
    Fan, Ruchao
    Liu, Gang
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 349 - 354
  • [27] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
    Pervaiz, Ayesha
    Hussain, Fawad
    Israr, Huma
    Tahir, Muhammad Ali
    Raja, Fawad Riasat
    Baloch, Naveed Khan
    Ishmanov, Farruh
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (08)
  • [28] AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition
    Jiang, Yang
    Chen, Jun
    Han, Kai
    Liu, Yi
    Ma, Siqi
    Song, Yuqing
    Liu, Zhe
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 145 - 157
  • [29] COMPARISON OF PERFORMANCE BETWEEN NORMAL AND WHISPERED SPEECH IN CHINESE ISOLATED WORD RECOGNITION
    Sha, Jun
    Chen, Xueqin
    Yu, Yibiao
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 545 - 548
  • [30] Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition
    Markovic, Branko R.
    Galic, Jovan
    Mijic, Miomir
    ARCHIVES OF ACOUSTICS, 2018, 43 (01) : 3 - 9