Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

被引:4
|
作者
Boucheron, Laura E. [1 ]
De Leon, Phillip L. [1 ]
Sandoval, Steven [1 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
关键词
RECOGNITION;
D O I
10.1109/DCC.2011.17
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we propose a low bit-rate speech codec based on a hybrid scalar/vector quantization of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of explicit phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000-4000 bps coding rates. The main advantage of the proposed codec is in distributed speech recognition (DSR) since speech features based on MFCCs can be directly obtained from codewords thus eliminating additional decode and feature extract stages.
引用
收藏
页码:103 / 112
页数:10
相关论文
共 50 条
  • [1] Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients
    Boucheron, Laura E.
    De Leon, Phillip L.
    Sandoval, Steven
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 610 - 619
  • [2] Automatic recognition of birdsongs using mel-frequency cepstral coefficients and vector quantization
    Lee, Chang-Hsing
    Lien, Cheng-Chang
    Huang, Ren-Zhuang
    IMECS 2006: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, 2006, : 331 - +
  • [3] Pitch quantization in low bit-rate speech coding
    Eriksson, T
    Kang, HG
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 489 - 492
  • [4] On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications
    Boucheron, Laura E.
    De Leon, Phillip L.
    ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 485 - 488
  • [5] Low bit-rate speech coding with predictive multi-level vector quantization
    Yu, Xingye
    Li, Ye
    Zhang, Peng
    Lin, Lingxia
    Cai, Tianyu
    APPLIED ACOUSTICS, 2025, 231
  • [6] Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients
    Palo, Hemanta Kumar
    Chandra, Mahesh
    Mohanty, Mihir Narayan
    ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 491 - 498
  • [7] Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction
    Shao, X
    Milner, B
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02): : 1134 - 1143
  • [8] Emotion Recognition from Speech Signal Using Mel-Frequency Cepstral Coefficients
    Korkmaz, Onur Erdem
    Atasoy, Ayten
    2015 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2015, : 1254 - 1257
  • [9] LOW BIT-RATE VIDEO CODING USING WAVELET VECTOR QUANTIZATION
    SAMPSON, DG
    DASILVA, EAB
    GHANBARI, M
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1995, 142 (03): : 141 - 148
  • [10] Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction
    Milner, Ben
    Shao, Xu
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 24 - 33