Hierarchical Phoneme Classification for Improved Speech Recognition

被引：10

作者：

Oh, Donghoon ^{[1
,2
]}

Park, Jeong-Sik ^{[3
]}

Kim, Ji-Hwan ^{[4
]}

Jang, Gil-Jin ^{[2
,5
]}

机构：

[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea

[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;

D O I：

10.3390/app11010428

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

引用

页码：1 / 17

页数：17

共 50 条

[21] Optimizing Arabic Speech Distinctive Phonetic Features and Phoneme Recognition Using Genetic Algorithm
Ibrahim, Ahmed B.
Seddiq, Yasser Mohammad
Meftah, Ali Hamid
Alghamdi, Mansour
Selouani, Sid-Ahmed
Qamhan, Mustafa A.
Alotaibi, Yousef A.
Alshebeili, Saleh A.
IEEE ACCESS, 2020, 8 : 200395 - 200411
[22] Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
Fujii, Yasuhisa
Yamamoto, Kazumasa
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (08): : 2094 - 2104
[23] PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION
Zhou, Wei
Berger, Simon
Schlueter, Ralf
Ney, Hermann
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5644 - 5648
[24] Improved phoneme-history-dependent search method for large-vocabulary continuous-speech recognition
Hori, T
Noda, Y
Matsunaga, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (06): : 1059 - 1067
[25] The Phoneme Set Influence for Lithuanian Speech Commands Recognition Accuracy
Greibus, Mindaugas
Ringeliene, Zivile
Telksnys, Laimutis
2017 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2017,
[26] FEATURE REPRESENTATIONS AND CLASSIFICATION PROCEDURES FOR SLOVENE PHONEME RECOGNITION
MIHELIC, F
IPSIC, I
DOBRISEK, S
PAVESIC, N
PATTERN RECOGNITION LETTERS, 1992, 13 (12) : 879 - 891
[27] A Hierarchical Evaluation Methodology in Speech Recognition
Gosztolya, Gabor
Kocsor, Andras
ACTA CYBERNETICA, 2005, 17 (02): : 213 - 224
[28] Simulation of English Speech Recognition Based on Improved Extreme Random Forest Classification
Hao, Chunhui
Li, Yuan
Computational Intelligence and Neuroscience, 2022, 2022
[29] EVALUATING GRAPHEME-TO-PHONEME CONVERTERS IN AUTOMATIC SPEECH RECOGNITION CONTEXT
Jouvet, Denis
Fohr, Dominique
Illina, Irina
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4821 - 4824
[30] Phoneme sequence recognition via DTW-based classification
Hossein Hamooni
Abdullah Mueen
Amy Neel
Knowledge and Information Systems, 2016, 48 : 253 - 275

← 1 2 3 4 5 →