Automatic speech recognition of Portuguese phonemes using neural networks ensemble

被引：4

作者：

Nedjah, Nadia ^{[1
]}

Bonilla, Alejandra D. ^{[1
]}

Mourelle, Luiza de Macedo ^{[2
]}

机构：

[1] Univ Estado Rio De Janeiro, Engn Fac, Dept Elect Engn & Telecommun, Rio de Janeiro, RJ, Brazil

[2] Univ Estado Rio De Janeiro, Engn Fac, Dept Syst Engn & Computat, Rio De Janeiro, RJ, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 229卷

关键词：

Automatic speech recognition; Phonetic recognition; Artificial neural networks; Ensemble; EXPERTS;

D O I：

10.1016/j.eswa.2023.120378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The automatic speech recognition based on detection of phonemes provides advantages for online recognition of a speech represented by a sound signal. The development of a system for automatic speech recognition is multidisciplinary. It covers several areas of research, such as linguistics, signal processing and computational intelligence. In this work, the process starts with a speech signal pre-processing to extract the main features of the speech signal at a given instant of time. Inspired by the "divide and conquer" principle, we bridge the complexity gap of automatic speech recognition by devising models based on an ensemble of neural network experts, allowing to divide the huge decision space regarding speech recognition so that each expert takes care only of a delimited area of this decision space. This novel application of this strategy improves the precision, sensitivity and accuracy of the recognition process. Each included expert decides regarding each one of the pre-processed input samples. The decision set thus obtained is weighted. So, the expert with the highest weight for the output will determine the sample final classification. After that, a dynamic post-processing step, implemented as a recurrent neural network, is executed. It aims at mitigating the oscillatory effect that occurs during the recognition of classes with similar characteristics. In this work, two ensembles are investigated. The first is based on the clustering of similar phonetics classes while the second takes care of the imbalanced distribution of samples in the training set. The proposed model achieves 7.63% improvement in terms of accuracy with respect to the best so far related model for automatic speech recognition.

引用

页数：23

共 72 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2] Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings [J].

Angrick, Miguel ;

Herff, Christian ;

Johnson, Garett ;

Shih, Jerry ;

Krusienski, Dean ;

Schultz, Tanja .

NEUROCOMPUTING, 2019, 342 :145-151

[3] Suitability of syllable-based modeling units for end-to-end speech recognition in Sanskrit and other Indian languages [J].

Anoop, Chandran Savithri ;

Ramakrishnan, Angarai Ganesan .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 220

[4]

Baber C., 1991, ELLIS HORWOOD SERIES

[5]

Barandela R, 2004, LECT NOTES COMPUT SC, V3138, P806

[6] CDHMM Parameters Selection for Speaker-Independent Phone Recognition In Continuous Speech System [J].

Ben Messaoud, Zaineb ;

Ben Hamida, Ahmed .

MELECON 2010: THE 15TH IEEE MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, 2010, :253-258

[7]

Bisol Leda., 2005, Introducao a Estudos de Fonologia do Portugues Brasileiro, V4A

[8]

Bonilla Cardona D. A., 2016, THESIS U ESTADO RIO

[9]

Bonilla Cardona D. A., 2015, AN 12 C BRAS INT COM, P1

[10] Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs [J].

Bonilla Cardona, Diana A. ;

Nedjah, Nadia ;

Mourelle, Luiza M. .

NEUROCOMPUTING, 2017, 265 :78-90

← 1 2 3 4 5 6 7 8 →