TIME ENVELOPE VOCODER, A NEW LP BASED CODING STRATEGY FOR USE AT BIT RATES OF 2.4KB/S AND BELOW

被引:2
作者
ATKINSON, IA
KONDOZ, AM
EVANS, BG
机构
[1] Centre for Satellite Engineering Research University of Surrey, Guildford
关键词
Algorithms - Closed loop control systems - Computational complexity - Computer simulation - Correlation theory - Decoding - Mathematical models - Speech analysis - Speech coding - Speech synthesis;
D O I
10.1109/49.345890
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a linear prediction (LP) based vocoder employing a novel technique which ensures smooth evolution of the synthetic speech waveform, In this coder, speech waveforms are considered as having a 'time envelope), the shape of which contains important perceptual information, By ensuring that the time envelope of the synthetic speech closely matches that of the original, natural sounding synthetic speech can be produced, Envelope matching may be achieved using a new, low complexity analysis by synthesis loop at the decoder which determines the synthetic excitation energy, The advantage over more traditional linear prediction vocoders is that the amplitude time envelope is preserved in addition to the spectral envelope, allowing the rapid amplitude transitions associated with onsets to be retained in the synthetic speech, resulting in a more intelligible output. Simply controlling the overall energy of the synthetic excitation is not sufficient to accurately control the synthetic speech energy, Small changes in linear prediction or pitch parameters due to quantization, for example, can cause variations in the synthetic speech amplitude, especially from one pitch cycle to the next resulting in noisy synthetic speech, The inclusion of an analysis by synthesis loop at the decoder ensures that the synthetic speech amplitude is independent of variations in the pitch period and LP parameters. This paper presents a complete vocoder scheme using time envelope matching, including details of techniques such as parameter interpolation, excitation pulse shaping and pitch tracking which have proven necessary to produce natural sounding synthetic speech at 2.4 kb/s and below.
引用
收藏
页码:449 / 457
页数:9
相关论文
共 9 条
  • [1] Campbell J. P. Jr., 1989, ICASSP-89: 1989 International Conference on Acoustics, Speech and Signal Processing (IEEE Cat. No.89CH2673-2), P735, DOI 10.1109/ICASSP.1989.266532
  • [2] GERSON IA, 1990, APR INT C AC SPEECH, P461
  • [3] MEUSE PC, 1990, INT CONF ACOUST SPEE, P9, DOI 10.1109/ICASSP.1990.115524
  • [4] SCHROEDER MR, 1985, MAR P INT C AC SPEEC, P937
  • [5] SOON F, 1984, P IEEE INT C ACOUST
  • [6] TREMAIN T, 1982, SPEECH TECHNOL, V1
  • [7] WONG K, 1989, OCT P IEE C SPEECH C
  • [8] YAO JH, 1991, P GLOBECOM, P695
  • [9] 1991, INMARSAT M VOICE COD