A very low bit rate speech coder using HMM-based speech recognition synthesis techniques

被引:0
作者
Tokuda, K [1 ]
Masuko, T [1 ]
Hiroi, J [1 ]
Kobayashi, T [1 ]
Kitamura, T [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 466, Japan
来源
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 | 1998年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a very low bit rate speech coder based on HMM (Hidden Markov Model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (Mel Log Spectrum Approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (= 8 bit/frame x 50 frame/s) without pitch quantization for both coders.
引用
收藏
页码:609 / 612
页数:4
相关论文
共 50 条
[41]   Simplified scoring methods for HMM-based speech recognition [J].
Paramonov, Pavel ;
Sutula, Nadezhda .
SOFT COMPUTING, 2016, 20 (09) :3455-3460
[42]   Normalized training for HMM-based visual speech recognition [J].
Nankaku, Y ;
Tokuda, K ;
Kitamura, T ;
Kobayashi, T .
2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, :234-237
[43]   A study on rescoring using HMM-based detectors for continuous speech recognition [J].
Fu, Qiang ;
Juang, Biing-Hwang .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :570-575
[44]   A 1200 bits/s HSX speech coder for very low bit rate communications [J].
Gournay, P ;
Chartier, F .
1998 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS-SIPS 98: DESIGN AND IMPLEMENTATION, 1998, :347-355
[45]   Simplified scoring methods for HMM-based speech recognition [J].
Pavel Paramonov ;
Nadezhda Sutula .
Soft Computing, 2016, 20 :3455-3460
[46]   Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic [J].
Houidhek, Amal ;
Colotte, Vincent ;
Mnasri, Zied ;
Jouvet, Denis .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (04) :895-906
[47]   POSTPROCESSOR USING FUZZY VECTOR QUANTIZER IN HMM-BASED SPEECH RECOGNITION [J].
KIM, HR ;
LEE, HS .
ELECTRONICS LETTERS, 1991, 27 (22) :1998-2000
[48]   Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System [J].
M. Dhanalakshmi ;
T. A. Mariya Celin ;
T. Nagarajan ;
P. Vijayalakshmi .
Circuits, Systems, and Signal Processing, 2018, 37 :674-703
[49]   SPEECH-LAUGHS: AN HMM-BASED APPROACH FOR AMUSED SPEECH SYNTHESIS [J].
El Haddad, Kevin ;
Dupont, Stephane ;
Urbain, Jerome ;
Dutoit, Thierry .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4939-4943
[50]   Scalable architecture for word HMM-based speech recognition [J].
Yoshizawa, S ;
Wada, N ;
Hayasaka, N ;
Miyanaga, Y .
2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, PROCEEDINGS, 2004, :417-420