Phone duration modeling for LVCSR using neural networks

被引:1
作者
Hadian, Hossein [1 ]
Povey, Daniel [2 ,3 ]
Sameti, Hossein [1 ]
Khudanpur, Sanjeev [2 ,3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
automatic speech recognition; neural networks; phone duration models; reproducible results;
D O I
10.21437/Interspeech.2017-1680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe our work on incorporating probabilities of phone durations, learned by a neural net, into an ASR system. Phone durations are incorporated via lattice rescoring. The input features are derived from the phone identities of a context window of phones, plus the durations of preceding phones within that window. Unlike some previous work, our network outputs the probability of different durations (in frames) directly, up to a fixed limit. We evaluate this method on several large vocabulary tasks, and while we consistently see improvements in Word Error Rates, the improvements are smaller when the lattices are generated with neural net based acoustic models.
引用
收藏
页码:518 / 522
页数:5
相关论文
共 14 条
[1]  
Alumäe T, 2014, INTERSPEECH, P1204
[2]  
ANASTASAKOS A, 1995, INT CONF ACOUST SPEE, P628, DOI 10.1109/ICASSP.1995.479676
[3]  
[Anonymous], 2005, P 5 INTERNATIONALCON
[4]  
[Anonymous], 2016, INTERSPEECH
[5]  
[Anonymous], 2011, WORKSH AUT SPEECH RE
[6]  
Gadde V. R., 2000, P NIST SPEECH TRANSC
[7]  
Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858
[8]  
PAUL DB, 1992, SPEECH AND NATURAL LANGUAGE, P357
[9]  
Pylkkonen J., 2004, INTERSPEECH
[10]   A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION [J].
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1989, 77 (02) :257-286