Using prosody to improve Mandarin automatic speech recognition

被引:0
作者
Ni, Chong-Jia [1 ]
Liu, Wen-Ju [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年
关键词
automatic speech recognition; prosody; MSD-HSMM; Maximum Entropy; CORPUS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal syllable model aiming at data sparse problem after prosody labeling. In this paper, we also utilize MSD-HSMM to model pitch, duration etc. influence factors of prosody, and at the same time, we unite MSD-HSMM model, prosody dependent tonal syllable duration model based on GMM and syntactical prosody model based on Maximum Entropy to decode. When compared with the baseline system, the performance of our prosody dependent speech recognition systems improves the correct rate of tonal syllable significantly.
引用
收藏
页码:2698 / 2701
页数:4
相关论文
共 17 条
  • [1] Aijun Li, 2002, CHINESE PROSODY AND
  • [2] Prosody dependent speech recognition on radio news corpus of American English
    Chen, K
    Hasegawa-Johnson, M
    Cohen, A
    Borys, S
    Kim, SS
    Cole, J
    Choi, JY
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 232 - 245
  • [3] Chiang CY, 2007, INT CONF ACOUST SPEE, P625
  • [4] Ferguson J.D., 1980, S APPL HIDDEN MARKOV, P143
  • [5] Gao S., 2000, PROC ICSLP
  • [6] Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus
    Hasegawa-Johnson, M
    Chen, K
    Cole, J
    Borys, S
    Kim, SS
    Cohen, A
    Zhang, T
    Choi, JY
    Kim, H
    Yoon, T
    Chavarria, S
    [J]. SPEECH COMMUNICATION, 2005, 46 (3-4) : 418 - 439
  • [7] Ni Chong-Jia, 2010, MANDARIN STRESS DETE
  • [8] Ni Chong-Jia, 2008, PROC ISCSLP
  • [9] [倪崇嘉 NI Chongjia], 2009, [中文信息学报, Journal of Chinese Information Processing], V23, P82
  • [10] Shriberg E., 2004, PROC ISCA INT CONF S