Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [41] Conversion from Phoneme Based to Grapheme Based Acoustic Models for Speech Recognition
    Zgank, Andrej
    Kacic, Zdravko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1587 - 1590
  • [42] Rule-Based Embedded HMMs Phoneme Classification to Improve Qur'anic Recitation Recognition
    Alqadasi, Ammar Mohammed Ali
    Sunar, Mohd Shahrizal
    Turaev, Sherzod
    Abdulghafor, Rawad
    Salam, Md Sah Hj
    Alashbi, Abdulaziz Ali Saleh
    Salem, Ali Ahmed
    Ali, Mohammed A. H.
    ELECTRONICS, 2023, 12 (01)
  • [43] Using Vector of Fractal Dimensions for Feature Reduction and Phoneme Recognition and Classification
    Hosseini, S. Abolfazl
    Ghassemian, Hassan
    Alizadeh, Roya
    2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 748 - 751
  • [44] Effect of aging on speech features and phoneme recognition: A study on Bengali voicing vowels
    Das B.
    Mandal S.
    Mitra P.
    Basu A.
    Das, B. (biswajit.net@gmail.com), 1600, Kluwer Academic Publishers (16): : 19 - 31
  • [45] EVIDENCE FOR THE STRENGTH OF THE RELATIONSHIP BETWEEN AUTOMATIC SPEECH RECOGNITION AND PHONEME ALIGNMENT PERFORMANCE
    Baghai-Ravary, Ladan
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5262 - 5265
  • [46] LANGUAGE DEPENDENT UNIVERSAL PHONEME POSTERIOR ESTIMATION FOR MIXED LANGUAGE SPEECH RECOGNITION
    Imseng, David
    Bourlard, Herve
    Magimai-Doss, Mathew
    Dines, John
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5012 - 5015
  • [47] Phoneme Classification Using the Auditory Neurogram
    Alam, Md. Shariful
    Zilany, Muhammad S. A.
    Jassim, Wissam A.
    Ahmad, Mohd Yazed
    IEEE ACCESS, 2017, 5 : 633 - 642
  • [48] CROSS-LINGUAL PHONEME MAPPING FOR LANGUAGE ROBUST CONTEXTUAL SPEECH RECOGNITION
    Patel, Ami
    Li, David
    Cho, Eunjoon
    Aleksic, Petar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5924 - 5928
  • [49] Deep belief networks for phoneme recognition in continuous Tamil speech-an analysis
    Raguram, Laxmi Sree Baskaran
    Shanmugam, Vijaya Madhaya
    TRAITEMENT DU SIGNAL, 2017, 34 (3-4) : 137 - 151
  • [50] AN IMPROVED METHOD FOR SPEECH/SPEAKER RECOGNITION
    Gaafar, Tamer S.
    Bakr, Hitham M. Abo
    Abdalla, Mahmoud I.
    2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,