Training data selection for improving discriminative training of acoustic models

被引:9
作者
Liu, Shih-Hung [1 ]
Chu, Fang-Hui [1 ]
Lin, Shih-Hsiang [1 ]
Lee, Hung-Shin [1 ]
Chen, Berlin [1 ]
机构
[1] Natl Taiwan Normal Univ, Grad Inst Comp Sci & Informat Engn, Taipei, Taiwan
来源
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 | 2007年
关键词
speech recognition; discriminative training; acoustic models; data selection; entropy;
D O I
10.1109/ASRU.2007.4430125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterance-level data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.
引用
收藏
页码:284 / 289
页数:6
相关论文
共 14 条
[1]  
Bahl L., P ICASSP 1986
[2]  
CHAN HY, P ICASSP 2004
[3]  
Chen B., P ICASSP 2004
[4]  
CHIU HS, P ICASSP 2007
[5]  
Gillick L., P ICASSP 1989
[6]  
HUI J, 2005, IEEE T SAP, V13
[7]  
JIANG H, 2006, IEEE T ASLP, V14
[8]  
Li JY, 2007, INT CONF ACOUST SPEE, P653
[9]  
LI JY, P ICSLP 2006
[10]  
LIU SH, P ICME 2007