Low bit-rate speech coding based on multicomponent AFM signal model

被引:7
作者
Bansal M. [1 ]
Sircar P. [1 ]
机构
[1] Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, 208016, Uttar Pradesh
关键词
Discrete energy separation algorithm; Fourier–Bessel expansion; Multi-tone amplitude and frequency modulation; Non-stationary signal analysis; Parametric model; Speech coding;
D O I
10.1007/s10772-018-9542-5
中图分类号
学科分类号
摘要
In this paper, we propose a novel multicomponent amplitude and frequency modulated (AFM) signal model for parametric representation of speech phonemes. An efficient technique is developed for parameter estimation of the proposed model. The Fourier–Bessel series expansion is used to separate a multicomponent speech signal into a set of individual components. The discrete energy separation algorithm is used to extract the amplitude envelope (AE) and the instantaneous frequency (IF) of each component of the speech signal. Then, the parameter estimation of the proposed AFM signal model is carried out by analysing the AE and IF parts of the signal component. The developed model is found to be suitable for representation of an entire speech phoneme (voiced or unvoiced) irrespective of its time duration, and the model is shown to be applicable for low bit-rate speech coding. The symmetric Itakura–Saito and the root-mean-square log-spectral distance measures are used for comparison of the original and reconstructed speech signals. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:783 / 795
页数:12
相关论文
共 36 条
[1]  
Bouguelia M.R., Nowaczyk S., Santosh K.C., Verikas A., Agreeing to disagree: Active learning with noisy labels without crowdsourcing, International Journal of Machine Learning and Cybernetics, 9, pp. 1307-1319, (2018)
[2]  
Bradbury J., Linear Predictive Coding, (2000)
[3]  
Chu W.C., Speech coding algorithms: Foundation and evolution of standardized coders, (2004)
[4]  
Equipments T., 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (Adpcm), 726, (1990)
[5]  
Furui S., Sondhi M.M., Advances in speech signal processing, (1991)
[6]  
Garofolo J., Lamel L., Fisher W., Fiscus J., Pallett D., Dahlgren N., Zue V., TIMIT acoustic-phonetic continuous speech corpus, (1993)
[7]  
George E.B., Smith M.J.T., Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Transactions on Speech and Audio Processing, 5, 5, pp. 389-406, (1997)
[8]  
Gray A., Markel J., Distance measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 5, pp. 380-391, (1976)
[9]  
Hood A.S., Pachori R.B., Reddy V.K., Sircar P., Parametric representation of speech employing multi-component AFM signal model, The International Journal of Speech Technology, 18, 3, pp. 287-303, (2015)
[10]  
Jayant N.S., Noll P., Digital coding of waveforms: Principles and applications to speech and video, (1984)