Feature normalization based on non-extensive statistics for speech recognition

被引:17
|
作者
Pardede, Hilman F. [1 ]
Iwano, Koji [2 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Grad Sch Informat Sci & Engn, Meguro Ku, Tokyo 1528552, Japan
[2] Tokyo City Univ, Fac Environm & Informat Studies, Tsuzuki Ku, Yokohama, Kanagawa 2248551, Japan
关键词
Robust speech recognition; Normalization; q-Logarithm; Non-extensive statistics; CROSS-TERMS; NOISE; MODEL; ENHANCEMENT; ENVIRONMENT; SPECTRA; ALGEBRA;
D O I
10.1016/j.specom.2013.02.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CEN-SREC-2 database. It significantly outperformed ETSI AFE front-end. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:587 / 599
页数:13
相关论文
共 50 条
  • [41] First and second order non-equilibrium phase transition and evidence for non-extensive Tsallis statistics in Earth's magnetosphere
    Pavlos, G. P.
    Iliopoulos, A. C.
    Tsoutsouras, V. G.
    Sarafopoulos, D. V.
    Sfiris, D. S.
    Karakatsanis, L. P.
    Pavlos, E. G.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (15) : 2819 - 2839
  • [42] On the closed form solutions for non-extensive Value at Risk
    Stavroyiannis, S.
    Makris, I.
    Nikolaidis, V.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2009, 388 (17) : 3536 - 3542
  • [43] On Noise Robust Feature for Speech Recognition Based on Power Function Family
    Pardede, Hilman F.
    2015 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2015, : 386 - 390
  • [44] Histogram equalization of contextual statistics of speech features for robust speech recognition
    Hsieh, Hsin-Ju
    Chen, Berlin
    Hung, Jeih-weih
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795
  • [45] AN AUDITORY-BASED FEATURE FOR ROBUST SPEECH RECOGNITION
    Shao, Yang
    Jin, Zhaozhang
    Wang, DeLiang
    Srinivasan, Soundararajan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4625 - +
  • [46] Adaptive channel normalization based on infornax algorithm for robust speech recognition
    Jung, Ho-Young
    ETRI JOURNAL, 2007, 29 (03) : 300 - 304
  • [47] Hedging for the Regime-Switching Price Model Based on Non-Extensive Statistical Mechanics
    Zhao, Pan
    Pan, Jian
    Zhou, Benda
    Wang, Jixia
    Song, Yu
    ENTROPY, 2018, 20 (04):
  • [48] Cepstral vector normalization based on stereo data for robust speech recognition
    Buera, Luis
    Lleida, Eduardo
    Miguel, Antonio
    Ortega, Alfonso
    Saz, Oscar
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1098 - 1113
  • [49] Fuzzy-based discriminative feature representation for children's speech recognition
    Mirhassani, Seyed Mostafa
    Ting, Hua-Nong
    DIGITAL SIGNAL PROCESSING, 2014, 31 : 102 - 114
  • [50] WEAK AND STRONG MAGNETIC FIELDS EFFECT ON THE NON-EXTENSIVE THERMODYNAMICS
    Tarek, Essam
    Ahmed, M. M.
    Shalaby, Asmaa G.
    ACTA PHYSICA POLONICA B, 2023, 54 (04):