Hierarchical Phoneme Classification for Improved Speech Recognition

被引：10

作者：

Oh, Donghoon ^{[1
,2
]}

Park, Jeong-Sik ^{[3
]}

Kim, Ji-Hwan ^{[4
]}

Jang, Gil-Jin ^{[2
,5
]}

机构：

[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea

[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;

D O I：

10.3390/app11010428

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

引用

页码：1 / 17

页数：17

共 50 条

[1] Analysis of Hierarchical Bottleneck Framework for Improved Phoneme Recognition
Zaki, Mohammadi
Sailor, Hardik B.
Patil, Hemant A.
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[2] Improved Phoneme-Based Myoelectric Speech Recognition
Zhou, Quan
Jiang, Ning
Englehart, Kevin
Hudgins, Bernard
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
[3] Myoclectric signal classification for phoneme-based speech recognition
Scheme, Erik J.
Hudgins, Bernard
Parker, Phillip A.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
[4] Diagnostics of speech recognition using classification phoneme diagnostic trees
Cernak, Milos
Wellekens, Christian
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 459 - +
[5] Phoneme and tonal accent recognition for Thai speech
Theera-Umpon, Nipon
Chansareewittaya, Suppakarn
Auephanwiriyakul, Sansanee
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
[6] Phoneme fuzzy characterization in speech recognition systems
Beritelli, F
Borrometi, L
Cuce, A
APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
[7] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
Xu, Ming
Hu, Ruimin
2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
[8] HIERARCHICAL CLASSIFICATION TREE MODELING OF NONSTATIONARY NOISE FOR ROBUST SPEECH RECOGNITION
Zelinka, Petr
Sigmund, Milan
INFORMATION TECHNOLOGY AND CONTROL, 2010, 39 (03): : 202 - 210
[9] Phoneme and Sentence-Level Ensembles for Speech Recognition
Dimitrakakis, Christos
Bengio, Samy
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
[10] Phoneme Aware Speech Recognition through Evolutionary Optimisation
Bird, Jordan J.
Wanner, Elizabeth
Ekart, Aniko
Faria, Diego R.
PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 362 - 363

← 1 2 3 4 5 →