Strategies to improve the performance of very low bit rate speech coders and application to a variable rate 1.2 kb/s codec

被引：13

作者：

de Lamare, RC ^{[1
]}

Alcaim, A ^{[1
]}

机构：

[1] CETUC PUC RIO, BR-22453900 Rio De Janeiro, Brazil

来源：

IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING | 2005年 / 152卷 / 01期

关键词：

D O I：

10.1049/ip-vis:20051189

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents several strategies to improve the performance of very low bit rate speech coders and describes a speech codec that incorporates these strategies and operates at an average bit rate of 1.2 kb/s. The encoding algorithm is based on several improvements in a mixed multiband excitation (MMBE) linear predictive coding (LPC) structure. A switched-predictive vector quantiser technique that outperforms previously reported schemes is adopted to encode the LSF parameters. Spectral and sound specific low rate models are used in order to achieve high quality speech at low rates. An MMBE approach with three sub-bands is employed to encode voiced frames, while fricatives and stops modelling and synthesis techniques are used for unvoiced frames. This strategy is shown to provide good quality synthesised speech, at a bit rate of only 0.4 kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, spectral envelope restoration combined with noise reduction (SERNR) postfifter is used. The contributions of the techniques described in this paper are separately assessed and then combined in the design of a low bit rate codec that is evaluated against the North American Mixed Excitation Linear Prediction (MELP) coder. The performance assessment is carried out in terms of the spectral distortion of LSF quantisation, mean opinion score (MOS), A/B comparison tests and the ITU-T P.862 perceptual evaluation of speech quality (PESQ) standard. Assessment results show that the improved methods for LSF quantisation, sound specific modelling and synthesis and the new postfiltering approach can significantly outperform previously reported techniques. Further results also indicate that a system combining the proposed improvements and operating at 1.2 kb/s, is comparable (slightly outperforming) a MELP coder operating at 2.4kb/s. For tandem connection situations, the proposed system is clearly superior to the MELP coder.

引用

页码：74 / 86

页数：13

共 28 条

[1] ALCAIM A, 1992, J BRAZILIAN TELECOMM, V7, P23
[2] [Anonymous], 2001, P862 ITUT
[3] ARSLAN L, 1995, P IEEE INT C AC SPEE, P373
[4] SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE
ATAL, BS
HANAUER, SL
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) : 637 - +
[5] VECTOR QUANTIZERS WITH DIRECT SUM CODEBOOKS
BARNES, CF
FROST, RL
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (02) : 565 - 580
[6] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
BOLL, SF
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
[7] ADAPTIVE POSTFILTERING FOR QUALITY ENHANCEMENT OF CODED SPEECH
CHEN, JH
GERSHO, A
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 59 - 71
[8] Differential coding of speech LSF parameters using hybrid vector quantization and bidirectional prediction
da Silva, LM
Alcaim, A
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02): : 208 - 211
[9] DASILVA LM, 1998, P ICT 98 PORT CARR G, P269
[10] Very low bit rate speech coding in tandem connections
de Lamare, RC
Alcaim, A
[J]. ELECTRONICS LETTERS, 2003, 39 (18) : 1356 - 1357

← 1 2 3 →