Improving speech command recognition through decision-level fusion of deep filtered speech cues

被引:5
|
作者
Mehra, Sunakshi [1 ]
Ranga, Virender [1 ]
Agarwal, Ritu [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Delhi, India
关键词
Speech filtering techniques; Swin-tiny transformer; Feed-forward neural network (FNN); Speech command recognition; ENHANCEMENT;
D O I
10.1007/s11760-023-02845-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Living beings communicate through speech, which can be analysed to identify words and sentences by recognizing the flow of spoken utterances. However, background noise will always have an impact on the speech recognition process. The detection rate in the presence of background noise is still unsatisfactory, necessitating further research and potential remedies in the speech recognition process. To improve the noisy speech information, this research suggests speech recognition based on a combination of median filtering and adaptive filtering. In this study, speech command recognition is achieved by employing popular noise reduction techniques and utilizing two parallel channels of filtered speech independently. The procedure involves five steps: firstly, enhancing signals using two parallel independent speech enhancement models (median and adaptive filtering); secondly, extracting 2D Mel spectrogram images from the enhanced signals; and thirdly, passing the 2-dimensional Mel spectrogram images to the tiny Swin Transformer for classification. The classification is performed among the large-scale ImageNet dataset, which consists of 14 million images and is approximately 150 GB in size. Fourth, the posterior probabilities extracted from the tiny Swin Transformer modelling are then fed into our proposed 3-layered feed-forward network for classification among our 10-speech command categories. Lastly, decision-level fusion is applied to the two parallel, independent channels obtained from the 3-layered feed-forward network. For experimentation, the Google Speech Command dataset version 2 is used. We obtained a test accuracy of 99.85% when compared with other state-of-the-art methods, demonstrating satisfactory results that can be reported.
引用
收藏
页码:1365 / 1373
页数:9
相关论文
共 50 条
  • [31] Improving the Utility of Speech Recognition Through Error Detection
    Voll, Kimberly
    Atkins, Stella
    Forster, Bruce
    JOURNAL OF DIGITAL IMAGING, 2008, 21 (04) : 371 - 377
  • [32] Methods for improving robustness of decision tree in mandarin speech recognition
    Xu, XH
    Zhu, H
    Guo, Q
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1975 - 1978
  • [33] Comparative analysis of decision-level fusion algorithms for 3D face recognition
    Gokberk, Berk
    Akarun, Lale
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 1018 - +
  • [34] IMPROVING CHILDREN SPEECH RECOGNITION THROUGH FEATURE LEARNING FROM RAW SPEECH SIGNAL
    Dubagunta, S. Pavankumar
    Kabil, Selen Hande
    Magimai-Doss, Mathew
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5736 - 5740
  • [35] A Hybrid Deep Learning Framework with Decision-Level Fusion for Breast Cancer Survival Prediction
    Othman, Nermin Abdelhakim
    Abdel-Fattah, Manal A.
    Ali, Ahlam Talaat
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
  • [36] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [37] Survey of approaches and experiments in decision-level fusion of Automatic Target Recognition (ATR) products
    Ross, Timothy D.
    Morgan, Doug R.
    Blasch, Erik P.
    Erickson, Kyle J.
    Kahler, Bart D.
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XVI, 2007, 6567
  • [38] Decision-level Fusion Scheme of SVM and naive Bayes Classifier for Radar Target Recognition
    Choi, Young-Jae
    Choi, In-Sik
    Chae, Dae-young
    2018 INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION (ISAP), 2018,
  • [39] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [40] Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features
    Sun, Linhui
    Li, Qiu
    Fu, Sheng
    Li, Pingan
    ETRI JOURNAL, 2022, 44 (03) : 462 - 475