Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引:28
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期
关键词
Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;
D O I
10.1109/TASL.2011.2162407
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 38 条
[1]  
[Anonymous], 2003, 202211 ETSI ES
[2]  
[Anonymous], 1992, G728 ITU EUR TEL STA
[3]  
[Anonymous], 2001, Discrete-Time Speech Signal Processing:Principles and Practice
[4]  
[Anonymous], 1999, USMILSTD3005
[5]  
[Anonymous], 1999, 1200 2400 BLT S NAT
[6]  
[Anonymous], 2001, Matrix Analysis and Applied Linear Algebra
[7]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[8]  
BEN M, 2002, P IEEE INT C AC SPEE
[9]   Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech [J].
Boucheron, Laura E. ;
De Leon, Phillip L. ;
Sandoval, Steven .
2011 DATA COMPRESSION CONFERENCE (DCC), 2011, :103-112
[10]   On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications [J].
Boucheron, Laura E. ;
De Leon, Phillip L. .
ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, :485-488