Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [21] Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm
    Ibrahim, Ahmed B.
    Seddiq, Yasser Mohammad
    Meftah, Ali Hamid
    Alghamdi, Mansour
    Selouani, Sid-Ahmed
    Qamhan, Mustafa A.
    Alotaibi, Yousef A.
    Alshebeili, Saleh A.
    IEEE ACCESS, 2020, 8 : 200395 - 200411
  • [22] Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
    Fujii, Yasuhisa
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (08): : 2094 - 2104
  • [23] PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION
    Zhou, Wei
    Berger, Simon
    Schlueter, Ralf
    Ney, Hermann
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5644 - 5648
  • [24] Improved phoneme-history-dependent search method for large-vocabulary continuous-speech recognition
    Hori, T
    Noda, Y
    Matsunaga, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (06): : 1059 - 1067
  • [25] The Phoneme Set Influence for Lithuanian Speech Commands Recognition Accuracy
    Greibus, Mindaugas
    Ringeliene, Zivile
    Telksnys, Laimutis
    2017 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2017,
  • [26] FEATURE REPRESENTATIONS AND CLASSIFICATION PROCEDURES FOR SLOVENE PHONEME RECOGNITION
    MIHELIC, F
    IPSIC, I
    DOBRISEK, S
    PAVESIC, N
    PATTERN RECOGNITION LETTERS, 1992, 13 (12) : 879 - 891
  • [27] A Hierarchical Evaluation Methodology in Speech Recognition
    Gosztolya, Gabor
    Kocsor, Andras
    ACTA CYBERNETICA, 2005, 17 (02): : 213 - 224
  • [28] Simulation of English Speech Recognition Based on Improved Extreme Random Forest Classification
    Hao, Chunhui
    Li, Yuan
    Computational Intelligence and Neuroscience, 2022, 2022
  • [29] EVALUATING GRAPHEME-TO-PHONEME CONVERTERS IN AUTOMATIC SPEECH RECOGNITION CONTEXT
    Jouvet, Denis
    Fohr, Dominique
    Illina, Irina
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4821 - 4824
  • [30] Phoneme sequence recognition via DTW-based classification
    Hossein Hamooni
    Abdullah Mueen
    Amy Neel
    Knowledge and Information Systems, 2016, 48 : 253 - 275