Application of phonetic knowledge in automatic speech recognition - Case analysis

被引:0
|
作者
Cao, Jianfen [1 ]
Li, Aijun [1 ]
Hu, Fang [1 ]
Zhang, Ligang [1 ,2 ]
机构
[1] Institute of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China
[2] Institute of Computer Science and Technology, Tianjin University, Tianjin 300072, China
来源
关键词
Speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One key topic in automatic speech recognition (ASR) systems is how to enhance the recognition accuracy by utilizing phonetic knowledge. Early Chinese number speech recognition system had difficulty discriminating 2 and 8. This paper discusses the application of phonetic knowledge in ASR through an analysis of this specific case. This study uses acoustical and physiological experiments combined with a set of perception tests to investigate the distinctive phonetic features for distinguishing 2 and 8. The results show that tonal information is the most prominent distinctive feature between 2 and 8. The features of the 3rd formant (F3) are the key discriminative factor in the absence of tonal information, since the 1st formant (F1) and the 2nd formant (F2) for 2 and 8 are similar. However, the tonal distinction was ignored in initial speech recognition systems. Moreover, in continuous speech, especially informal speech, the difference for F3between 2 (/er4/) and 8 (/ba1/) is not often significant due to articulatory undershoot of the tongue tip movement in articulating the 2. Thus, more phonetic knowledge must be used to improve the accuracy of ASR systems.
引用
收藏
页码:748 / 753
相关论文
共 50 条
  • [31] Information-theoretic analysis of efficiency of the phonetic encoding-decoding method in automatic speech recognition
    Savchenko, V. V.
    Savchenko, A. V.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2016, 61 (04) : 430 - 435
  • [32] Improved automatic recognition of Norwegian natural numbers by incorporating phonetic knowledge
    Kvale, K
    Amdal, I
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1763 - 1766
  • [33] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    INTERSPEECH 2019, 2019, : 81 - 85
  • [34] A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
    Juneja, Amit
    Espy-Wilson, Carol
    Journal of the Acoustical Society of America, 2008, 123 (02): : 1154 - 1168
  • [35] A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
    Juneja, Amit
    Espy-Wilson, Carol
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (02): : 1154 - 1168
  • [36] Application of EαNets to feature recognition of articulation manner in knowledge-based automatic speech recognition
    Siniscalchi, Sabato M.
    Li, Jinyu
    Pilato, Giovanni
    Vassallo, Giorgio
    Clements, Mark A.
    Gentile, Antonio
    Sorbello, Filippo
    NEURAL NETS, 2006, 3931 : 140 - 146
  • [37] Application of automatic speech recognition in call classification
    Das, SS
    Chan, N
    Wages, D
    Hansen, JHL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3896 - 3899
  • [38] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [39] The Case for Case-Based Automatic Speech Recognition
    Maier, Viktoria
    Moore, Roger K.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2999 - 3002