Enhancing robustness for speech recognition through bio-inspired auditory filter-bank

被引:1
作者
Maganti, Hari Krishna [1 ]
Matassoni, Marco [1 ]
机构
[1] Fdn Bruno Kessler Irst, Ctr Informat Technol, I-38123 Trento, Italy
关键词
speech recognition; robustness; reverberant environment; feature extraction; auditory processing; lateral inhibition and level dependent frequency analysis;
D O I
10.1504/IJBIC.2012.049884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important properties observed in basilar membrane filtering, aimed to improve robustness of the human car is lateral inhibition-based level-dependent frequency resolution. However, this particular property has not been extensively considered for improving robustness of the speech processing systems. In this work, an auditory filter-bank which includes lateral inhibition based on input stimulus providing a good fit to human auditory masking is used for improving robustness of the speech recognition system. The gammachirp auditory filter is the real part of the analytic gammachirp function which has been shown to provide an accurate description for the asymmetric and lateral inhibition observed in the basilar membrane filtering. The gammachirp is characterised with asymmetry in the low frequency tail of auditory filter response and models level dependent properties such as decrease in gain and a shift in the centre frequency of the filter with increase in level. The speech recognition experiments using the standard HTK framework are performed on standard Aurora-5 digit task database, both simulated and real data recorded with distant microphones in a hands-free mode at a real meeting room. The gammachirp-based features show reliable and consistent improvements when compared to the conventional features used for speech recognition.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 34 条
[21]   An application of discriminative feature extraction lo filter-bank-based speech recognition [J].
Biem, A ;
Katagiri, S ;
McDermott, E ;
Juang, BH .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02) :96-110
[22]   An Efferent-Inspired Auditory Model Front-End for Speech Recognition [J].
Lee, Chia-ying ;
Glass, James ;
Ghitza, Oded .
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :56-+
[23]   Speech Recognition-Based Automated Visual Acuity Testing with Adaptive Mel Filter Bank [J].
Nisar, Shibli ;
Khan, Muhammad Asghar ;
Algarni, Fahad ;
Wakeel, Abdul ;
Uddin, M. Irfan ;
Ullah, Insaf .
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02) :2991-3004
[24]   Handwritten numeral recognition using non-redundant Stockwell transform and bio-inspired optimal zoning [J].
Dash, Kalyan Sourav ;
Puhan, Niladri B. ;
Panda, Ganapati .
IET IMAGE PROCESSING, 2015, 9 (10) :874-882
[25]   Bio-inspired Multi-layer Spiking Neural Network Extracts Discriminative Features from Speech Signals [J].
Tavanaei, Amirhossein ;
Maida, Anthony .
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT VI, 2017, 10639 :899-908
[26]   Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition [J].
Sun, Yanqing ;
Zhou, Yu ;
Zhao, Qingwei ;
Zhang, Pengyuan ;
Pan, Fuping ;
Yan, Yonghong .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) :2431-2439
[27]   Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition [J].
Boril, Hynek ;
Fousek, Petr ;
Pollak, Petr .
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, :381-384
[28]   BCM-Inspired Synapses Constructed with Barrier-Modulated Coupling Junctions for Enhancing Speech Recognition [J].
Cai, Dan ;
Liu, Yunbo ;
Wang, Jinyong ;
Zhao, Tianchen ;
Shen, Miao ;
Zhang, Fangjie ;
Jiang, Yadong ;
Gu, Deen .
ADVANCED FUNCTIONAL MATERIALS, 2024, 34 (27)
[29]   Rethinking Auditory Affective Descriptors Through Zero-Shot Emotion Recognition in Speech [J].
Xu, Xinzhou ;
Deng, Jun ;
Zhang, Zixing ;
Fan, Xijian ;
Zhao, Li ;
Devillers, Laurence ;
Schuller, Bjoern W. .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (05) :1530-1541
[30]   Feature extraction with combination of HMT-based denoising and weighted filter bank analysis for robust speech recognition [J].
Jung, SY ;
Son, J ;
Bae, K .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03) :435-438