Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

被引:28
|
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
关键词
Speech analysis; speech coding; OBJECTIVE QUALITY MEASURES; RECOGNITION;
D O I
10.1109/TASL.2011.2162407
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P. 862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.
引用
收藏
页码:610 / 619
页数:10
相关论文
共 50 条
  • [1] Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech
    Boucheron, Laura E.
    De Leon, Phillip L.
    Sandoval, Steven
    2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 103 - 112
  • [2] Pitch quantization in low bit-rate speech coding
    Eriksson, T
    Kang, HG
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 489 - 492
  • [3] On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications
    Boucheron, Laura E.
    De Leon, Phillip L.
    ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 485 - 488
  • [4] Automatic recognition of birdsongs using mel-frequency cepstral coefficients and vector quantization
    Lee, Chang-Hsing
    Lien, Cheng-Chang
    Huang, Ren-Zhuang
    IMECS 2006: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, 2006, : 331 - +
  • [5] Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients
    Palo, Hemanta Kumar
    Chandra, Mahesh
    Mohanty, Mihir Narayan
    ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 491 - 498
  • [6] Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction
    Shao, X
    Milner, B
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02): : 1134 - 1143
  • [7] Emotion Recognition from Speech Signal Using Mel-Frequency Cepstral Coefficients
    Korkmaz, Onur Erdem
    Atasoy, Ayten
    2015 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2015, : 1254 - 1257
  • [8] Fingerprint Recognition Using Mel-Frequency Cepstral Coefficients
    Hashad F.G.
    Halim T.M.
    Diab S.M.
    Sallam B.M.
    El-Samie F.E.A.
    Pattern Recognition and Image Analysis, 2010, 20 (03) : 360 - 369
  • [9] Computing Mel-frequency cepstral coefficients on the power spectrum
    Molau, S
    Pitz, M
    Schlüter, R
    Ney, H
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 73 - 76
  • [10] Mel-frequency Cepstral Coefficients for Eye Movement Identification
    Nguyen Viet Cuong
    Vu Dinh
    Lam Si Tung Ho
    2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 253 - 260