Feature normalization based on non-extensive statistics for speech recognition

被引：17

作者：

Pardede, Hilman F. ^{[1
]}

Iwano, Koji ^{[2
]}

Shinoda, Koichi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Dept Comp Sci, Grad Sch Informat Sci & Engn, Meguro Ku, Tokyo 1528552, Japan

[2] Tokyo City Univ, Fac Environm & Informat Studies, Tsuzuki Ku, Yokohama, Kanagawa 2248551, Japan

来源：

SPEECH COMMUNICATION | 2013年 / 55卷 / 05期

关键词：

Robust speech recognition; Normalization; q-Logarithm; Non-extensive statistics; CROSS-TERMS; NOISE; MODEL; ENHANCEMENT; ENVIRONMENT; SPECTRA; ALGEBRA;

D O I：

10.1016/j.specom.2013.02.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CEN-SREC-2 database. It significantly outperformed ETSI AFE front-end. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：587 / 599

页数：13

共 50 条

[41] First and second order non-equilibrium phase transition and evidence for non-extensive Tsallis statistics in Earth's magnetosphere
Pavlos, G. P.
Iliopoulos, A. C.
Tsoutsouras, V. G.
Sarafopoulos, D. V.
Sfiris, D. S.
Karakatsanis, L. P.
Pavlos, E. G.
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (15) : 2819 - 2839
[42] On the closed form solutions for non-extensive Value at Risk
Stavroyiannis, S.
Makris, I.
Nikolaidis, V.
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2009, 388 (17) : 3536 - 3542
[43] On Noise Robust Feature for Speech Recognition Based on Power Function Family
Pardede, Hilman F.
2015 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2015, : 386 - 390
[44] Histogram equalization of contextual statistics of speech features for robust speech recognition
Hsieh, Hsin-Ju
Chen, Berlin
Hung, Jeih-weih
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795
[45] AN AUDITORY-BASED FEATURE FOR ROBUST SPEECH RECOGNITION
Shao, Yang
Jin, Zhaozhang
Wang, DeLiang
Srinivasan, Soundararajan
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4625 - +
[46] Adaptive channel normalization based on infornax algorithm for robust speech recognition
Jung, Ho-Young
ETRI JOURNAL, 2007, 29 (03) : 300 - 304
[47] Hedging for the Regime-Switching Price Model Based on Non-Extensive Statistical Mechanics
Zhao, Pan
Pan, Jian
Zhou, Benda
Wang, Jixia
Song, Yu
ENTROPY, 2018, 20 (04):
[48] Cepstral vector normalization based on stereo data for robust speech recognition
Buera, Luis
Lleida, Eduardo
Miguel, Antonio
Ortega, Alfonso
Saz, Oscar
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1098 - 1113
[49] Fuzzy-based discriminative feature representation for children's speech recognition
Mirhassani, Seyed Mostafa
Ting, Hua-Nong
DIGITAL SIGNAL PROCESSING, 2014, 31 : 102 - 114
[50] WEAK AND STRONG MAGNETIC FIELDS EFFECT ON THE NON-EXTENSIVE THERMODYNAMICS
Tarek, Essam
Ahmed, M. M.
Shalaby, Asmaa G.
ACTA PHYSICA POLONICA B, 2023, 54 (04):

← 1 2 3 4 5 →