Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Analysis of Hierarchical Bottleneck Framework for Improved Phoneme Recognition
    Zaki, Mohammadi
    Sailor, Hardik B.
    Patil, Hemant A.
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
  • [2] Improved Phoneme-Based Myoelectric Speech Recognition
    Zhou, Quan
    Jiang, Ning
    Englehart, Kevin
    Hudgins, Bernard
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
  • [3] Myoclectric signal classification for phoneme-based speech recognition
    Scheme, Erik J.
    Hudgins, Bernard
    Parker, Phillip A.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
  • [4] Diagnostics of speech recognition using classification phoneme diagnostic trees
    Cernak, Milos
    Wellekens, Christian
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 459 - +
  • [5] Phoneme and tonal accent recognition for Thai speech
    Theera-Umpon, Nipon
    Chansareewittaya, Suppakarn
    Auephanwiriyakul, Sansanee
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
  • [6] Phoneme fuzzy characterization in speech recognition systems
    Beritelli, F
    Borrometi, L
    Cuce, A
    APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
  • [7] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
    Xu, Ming
    Hu, Ruimin
    2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
  • [8] HIERARCHICAL CLASSIFICATION TREE MODELING OF NONSTATIONARY NOISE FOR ROBUST SPEECH RECOGNITION
    Zelinka, Petr
    Sigmund, Milan
    INFORMATION TECHNOLOGY AND CONTROL, 2010, 39 (03): : 202 - 210
  • [9] Phoneme and Sentence-Level Ensembles for Speech Recognition
    Dimitrakakis, Christos
    Bengio, Samy
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
  • [10] Phoneme Aware Speech Recognition through Evolutionary Optimisation
    Bird, Jordan J.
    Wanner, Elizabeth
    Ekart, Aniko
    Faria, Diego R.
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 362 - 363