Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引:28
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期
关键词
Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;
D O I
10.1109/TASL.2011.2162407
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 38 条
[31]  
Schroeder M., 1985, IEEE International Conference on Acoustics, Speech, and Signal Processing, V10, P937
[32]  
Schwarz P, 2006, INT CONF ACOUST SPEE, P325
[33]  
Shao X, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P704
[34]  
Skowronski MD, 2002, INT CONF ACOUST SPEE, P801
[35]   A scale for the measurement of the psychological magnitude pitch [J].
Stevens, SS ;
Volkmann, J ;
Newman, EB .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1937, 8 (03) :185-190
[36]  
Vertanen K., 2006, Baseline WSJ Acoustic Models for HTK and Sphinx
[37]   Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice [J].
Yamada, Takeshi ;
Kumakura, Masakazu ;
Kitawaki, Nobuhiko .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06) :2006-2013
[38]   Type-2 fuzzy hidden Markov models and their application to speech recognition [J].
Zeng, Jia ;
Liu, Zhi-Qiang .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2006, 14 (03) :454-467