Ultra low bit rate speech coding using an ergodic hidden Markov model

被引:0
作者
Lee, ME [1 ]
Durey, AS [1 ]
Moore, E [1 ]
Clements, M [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Ctr Signal & Image Proc, Atlanta, GA 30332 USA
来源
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING | 2005年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the framework for an ultra low bit rate speech vocoder. The system is based on a recognition-synthesis paradigm in which a single ergodic hidden Markov model (EHMM) is used to capture the statistical characterizations of speech in a flexible manner capable of limiting the effects of recognition errors. Because predetermined speech units are not used, this system has the advantage of not requiring a transcription for the training data set. By incorporating a mixed excitation scheme based on an improved MELP formulation into the EHMM, additional gains in quality and speaker characterization are achieved at no cost to the bit rate.
引用
收藏
页码:765 / 768
页数:4
相关论文
共 12 条
  • [1] [Anonymous], 2003, CMULTI03177
  • [2] ERTAN AE, 2003, THESIS GEORGIA I TEC
  • [3] Farges E. P., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P433
  • [4] Fukada T., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P137, DOI 10.1109/ICASSP.1992.225953
  • [5] MAIA RD, 2003, ICASSP, P796
  • [6] McCree A, 1998, INT CONF ACOUST SPEE, P593, DOI 10.1109/ICASSP.1998.675334
  • [7] PEPPER DJ, 1991, INT CONF ACOUST SPEE, P465, DOI 10.1109/ICASSP.1991.150377
  • [8] PEPPER DJ, 1990, THESIS GEORGIA I TEC
  • [9] RIBEIRO CM, 2000, ICSLP, P830
  • [10] ROUCOS S, 1982, ICASSP, P582