Enhancing robustness for speech recognition through bio-inspired auditory filter-bank

被引:1
|
作者
Maganti, Hari Krishna [1 ]
Matassoni, Marco [1 ]
机构
[1] Fdn Bruno Kessler Irst, Ctr Informat Technol, I-38123 Trento, Italy
关键词
speech recognition; robustness; reverberant environment; feature extraction; auditory processing; lateral inhibition and level dependent frequency analysis;
D O I
10.1504/IJBIC.2012.049884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important properties observed in basilar membrane filtering, aimed to improve robustness of the human car is lateral inhibition-based level-dependent frequency resolution. However, this particular property has not been extensively considered for improving robustness of the speech processing systems. In this work, an auditory filter-bank which includes lateral inhibition based on input stimulus providing a good fit to human auditory masking is used for improving robustness of the speech recognition system. The gammachirp auditory filter is the real part of the analytic gammachirp function which has been shown to provide an accurate description for the asymmetric and lateral inhibition observed in the basilar membrane filtering. The gammachirp is characterised with asymmetry in the low frequency tail of auditory filter response and models level dependent properties such as decrease in gain and a shift in the centre frequency of the filter with increase in level. The speech recognition experiments using the standard HTK framework are performed on standard Aurora-5 digit task database, both simulated and real data recorded with distant microphones in a hands-free mode at a real meeting room. The gammachirp-based features show reliable and consistent improvements when compared to the conventional features used for speech recognition.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 34 条
  • [1] A Level-dependent Auditory Filter-bank for Speech Recognition in Reverberant Environments
    Maganti, HariKrishna
    Matassoni, Marco
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 692 - 695
  • [2] Filtering of filter-bank energies for robust speech recognition
    Jung, HY
    ETRI JOURNAL, 2004, 26 (03) : 273 - 276
  • [3] Performance and robustness of bio-inspired digital liquid state machines: A case study of speech recognition
    Jin, Yingyezhe
    Li, Peng
    NEUROCOMPUTING, 2017, 226 : 145 - 160
  • [4] Bilinear map of filter-bank outputs for DNN-based speech recognition
    Ogawa, Tetsuji
    Ueda, Kenshiro
    Katsurada, Kouichi
    Kobayashi, Tetsunori
    Nitta, Tsuneo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 16 - 20
  • [5] A bio-inspired feature extraction for robust speech recognition
    Zouhir, Youssef
    Ouni, Kais
    SPRINGERPLUS, 2014, 3
  • [6] An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1926 - 1937
  • [7] INSTANTANEOUS FREQUENCY FILTER-BANK FEATURES FOR LOW RESOURCE SPEECH RECOGNITION USING DEEP RECURRENT ARCHITECTURES
    Nayak, Shekhar
    Kumar, C. Shiva
    Murty, K. Sri Rama
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 105 - 110
  • [8] Evaluation of a feature selection scheme on ICA-based filter-bank for speech recognition
    Faraji, Neda
    Ahadi, S. M.
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1277 - 1281
  • [9] Bio-Inspired Filter Banks for Frequency Recognition of SSVEP-Based BrainComputer Interfaces
    Demir, Ali Fatih
    Arslan, Huseyin
    Uysal, Ismail
    IEEE ACCESS, 2019, 7 : 160295 - 160303
  • [10] Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing
    Sai, B. Tarun
    Yadav, Ishwar Chandra
    Shahnawazuddin, S.
    Pradhan, Gayadhar
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 242 - 246