Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech

被引:30
作者
Agiomyrgiannakis, Yannis [1 ,2 ]
Stylianou, Yannis [1 ,2 ]
机构
[1] FORTH, Inst Comp Sci, GR-70013 Iraklion, Crete, Greece
[2] Univ Crete, Dept Comp Sci, Multimedia Informat Lab, Iraklion 71409, Crete, Greece
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 04期
关键词
Circular statistics; phase quantization; sinusoidal models; speech analysis; speech coding; voice-over-IP; wrapped Gaussian mixture models (WGMMs);
D O I
10.1109/TASL.2008.2008229
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The harmonic representation of speech signals has found many applications in speech processing. This paper presents a novel statistical approach to model the behavior of harmonic phases. Phase information is decomposed into three parts: a minimum phase part, a translation term, and a residual term referred to as dispersion phase. Dispersion phases are modeled by wrapped Gaussian mixture models (WGMMs) using an expectation-maximization algorithm suitable for circular vector data. A multivariate WGMM-based phase quantizer is then proposed and constructed using novel scalar quantizers for circular random variables. The proposed phase modeling and quantization scheme is evaluated in the context of a narrowband harmonic representation of speech. Results indicate that it is possible to construct a variable-rate harmonic codec that is equivalent to iLBC at approximately 13 kbps.
引用
收藏
页码:775 / 786
页数:12
相关论文
共 38 条
[11]  
Bjorck A, 1996, NUMERICAL METHODS LE, DOI [10.1137/1.9781611971484, DOI 10.1137/1.9781611971484]
[12]  
CHAZAN D, 2002, P ICSLP 2002, P2381
[13]  
GAROFOLO J, 1993, P LING DAT CONS
[14]  
Gersho A., 2012, Vector Quantization and Signal Compression, V159
[15]   Enhanced waveform interpolative coding at low bit-rate [J].
Gottesman, O ;
Gersho, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :786-798
[16]   MULTIBAND EXCITATION VOCODER [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (08) :1223-1235
[17]  
HEDELIN P, 1988, P IEEE ICASSP 88, P339
[18]   Hidden Markov models for circular and linear-circular time series [J].
Holzmann, Hajo ;
Munk, Axel ;
Suster, Max ;
Zucchini, Walter .
ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2006, 13 (03) :325-347
[19]  
JIANG Y, 1995, P IEEE WORKSH SPEECH, P21
[20]   On the perceptually irrelevant phase information in sinusoidal representation of speech [J].
Kim, DS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :900-905