Application of phonetic knowledge in automatic speech recognition - Case analysis

被引:0
|
作者
Cao, Jianfen [1 ]
Li, Aijun [1 ]
Hu, Fang [1 ]
Zhang, Ligang [1 ,2 ]
机构
[1] Institute of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China
[2] Institute of Computer Science and Technology, Tianjin University, Tianjin 300072, China
来源
关键词
Speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One key topic in automatic speech recognition (ASR) systems is how to enhance the recognition accuracy by utilizing phonetic knowledge. Early Chinese number speech recognition system had difficulty discriminating 2 and 8. This paper discusses the application of phonetic knowledge in ASR through an analysis of this specific case. This study uses acoustical and physiological experiments combined with a set of perception tests to investigate the distinctive phonetic features for distinguishing 2 and 8. The results show that tonal information is the most prominent distinctive feature between 2 and 8. The features of the 3rd formant (F3) are the key discriminative factor in the absence of tonal information, since the 1st formant (F1) and the 2nd formant (F2) for 2 and 8 are similar. However, the tonal distinction was ignored in initial speech recognition systems. Moreover, in continuous speech, especially informal speech, the difference for F3between 2 (/er4/) and 8 (/ba1/) is not often significant due to articulatory undershoot of the tongue tip movement in articulating the 2. Thus, more phonetic knowledge must be used to improve the accuracy of ASR systems.
引用
收藏
页码:748 / 753
相关论文
共 50 条
  • [21] AN INVESTIGATION OF SUBSPACE MODELING FOR PHONETIC AND SPEAKER VARIABILITY IN AUTOMATIC SPEECH RECOGNITION
    Rose, Richard
    Yin, Shou-Chun
    Tang, Yun
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4508 - 4511
  • [22] CONTEXT DEPENDENT PHONETIC STRING EDIT DISTANCE FOR AUTOMATIC SPEECH RECOGNITION
    Droppo, Jasha
    Acero, Alex
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4358 - 4361
  • [23] Using Broad Phonetic Classes to Guide Search in Automatic Speech Recognition
    Ziegler, Stefan
    Ludusan, Bogdan
    Gravier, Guillaume
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1022 - 1025
  • [24] AUTOMATIC SPEECH RECOGNITION AND ITS APPLICATION
    BRUNDAGE, WJ
    CONTROL ENGINEERING, 1983, 30 (04) : 117 - 117
  • [25] Prosodic knowledge sources for automatic speech recognition
    Vergyri, D
    Stolcke, A
    Gadde, VRR
    Ferrer, L
    Shriberg, E
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 208 - 211
  • [26] AN INTEGRATED KNOWLEDGE BASE FOR SPEECH SYNTHESIS AND AUTOMATIC SPEECH RECOGNITION
    TATHAM, MAA
    JOURNAL OF PHONETICS, 1985, 13 (02) : 175 - 188
  • [27] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
    Pandey, Ashutosh
    Liu, Chunxi
    Wang, Yun
    Saraf, Yatharth
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
  • [28] MIDCLASS PHONETIC ANALYSIS FOR A CONTINUOUS SPEECH RECOGNITION SYSTEM
    DALBY, J
    LAVER, J
    HILLER, SM
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 347 - 354
  • [29] Acoustic-Phonetic Analysis for Speech Recognition: A Review
    Sarma, Biswajit Dev
    Prasanna, S. R. Mahadeva
    IETE TECHNICAL REVIEW, 2018, 35 (03) : 305 - 327
  • [30] Integration of tonal knowledge into phonetic HMMs for recognition of speech in tone languages
    Demeechai, T
    Mäkeläinen, K
    SIGNAL PROCESSING, 2000, 80 (10) : 2241 - 2247