MAIN VOWEL DOMAIN TONE MODELING WITH LEXICAL AND PROSODIC ANALYSIS FOR MANDARIN ASR

被引:0
作者
Zhang, Shilei [1 ]
Shi, Qin [1 ]
Chu, Stephen M. [2 ]
Qin, Yong [1 ]
机构
[1] IBM Corp, China Res Lab, Beijing 100193, Peoples R China
[2] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
tone models; decision tree; main vowel; tone domain; lattice rescoring; SPEECH;
D O I
10.1109/ICASSP.2009.4960645
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The tone is a distinctive discriminative feature in Mandarin Chinese. Often functional, yet seldom thorough are most large-scale Mandarin speech recognition systems in treating tone modeling. In particular, many lack the necessary sophistication to deal with the myriad variations arising from the combination of acoustic and lexical contexts. This paper reports an attempt to account for these variabilities and to bring richer tone modeling into the IBM Mandarin broadcast transcription system. In particular, we describe a system that combines the embedded approach and a novel explicit tone modeling technique characterized by a. robust tone tracking in the main-vowel domain, and b. context-dependent models with lexical and prosodic contexts. The proposed method is validated on a connected-digits set and subsequently evaluated on a large-vocabulary broadcast transcription task. It is shown that 14.8% and 5.4% relative reductions in character error rate are achieved respectively.
引用
收藏
页码:4561 / +
页数:2
相关论文
共 12 条
[1]  
CHEN LJ, 2001, P ICASSP 01, V1, P61
[2]  
CHU SM, 2008, P IEEE INT C AC SPEE, P4329
[3]   YIN, a fundamental frequency estimator for speech and music [J].
de Cheveigné, A ;
Kawahara, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) :1917-1930
[4]  
Gish H, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P466, DOI 10.1109/ICSLP.1996.607155
[5]  
HUANG H, 2008, ICASSP 08, P1541
[6]  
Lei X, 2007, INT CONF ACOUST SPEE, P665
[7]  
TIAN Y, 2004, ICASSP 04, P105
[8]  
WANG C, 2000, THESIS MIT CAMBRIDGE
[9]  
WANG C, 1998, ICSLP 98
[10]  
WONG PF, 2004, ICASSP 04, P905