Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引:28
|
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
关键词
Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;
D O I
10.1109/TASL.2011.2162407
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 50 条
  • [41] Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband Speech
    Nour-Eldin, Amr H.
    Kabal, Peter
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 53 - 56
  • [42] LOW BIT-RATE VIDEO CODING USING WAVELET VECTOR QUANTIZATION
    SAMPSON, DG
    DASILVA, EAB
    GHANBARI, M
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1995, 142 (03): : 141 - 148
  • [43] Zone coding of DCT coefficients for very low bit-rate video coding
    Ngamwitthayanon, N
    Ratanasanya, S
    Amornraksa, T
    IEEE ICIT' 02: 2002 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS I AND II, PROCEEDINGS, 2002, : 769 - 773
  • [44] Codebook design considerations for low bit-rate speech coding using joint segmentation-quantization
    Mayrench, R
    Malah, D
    21ST IEEE CONVENTION OF THE ELECTRICAL AND ELECTRONIC ENGINEERS IN ISRAEL - IEEE PROCEEDINGS, 2000, : 398 - 401
  • [45] Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems
    Md. Jahangir Alam
    Patrick Kenny
    Douglas O’Shaughnessy
    Cognitive Computation, 2013, 5 : 533 - 544
  • [46] Speech reconstruction from mel frequency cepstral coefficients and pitch frequency
    Chazan, D
    Hoory, R
    Cohen, G
    Zibulski, M
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1299 - 1302
  • [47] Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems
    Alam, Md. Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    COGNITIVE COMPUTATION, 2013, 5 (04) : 533 - 544
  • [48] Chip design of mel frequency cepstral coefficients for speech recognition
    Wang, JC
    Wang, JF
    Weng, YS
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3658 - 3661
  • [49] Split-Dimension Vector Quantization of Parcor Coefficients for Low Bit Rate Speech Coding
    Law, Kwok-Wah
    Chan, Cheung-Fat
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 443 - 446
  • [50] SPEECH RECONSTRUCTION FOR MFCC-BASED LOW BIT-RATE SPEECH CODING
    Jiang Wenbin
    Ying Rendong
    Liu Peilin
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,