Persian speech recognition using deep learning

被引：1

作者：

Hadi Veisi

Armita Haji Mani

机构：

[1] University of Tehran,Faculty of New Sciences and Technologies (FNST)

来源：

International Journal of Speech Technology | 2020年 / 23卷

关键词：

Persian speech recognition; Bidirectional long short-term memory neural network; Deep neural network; Deep belief network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Up to now, various methods are used for Automatic Speech Recognition (ASR), and among which the Hidden Markov Model (HMM) and Artificial Neural Networks (ANNs) are the most important ones. One of the existing challenges is increasing the accuracy and efficiency of these systems. One way to enhance the accuracy of them is by improving the acoustic model (AM). In this paper, for the first time, the combination of deep belief network (DBN), for extracting features of speech signals, and Deep Bidirectional Long Short-Term Memory (DBLSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an AM on the Farsdat Persian speech data set. The obtained results show that the use of a deep neural network (DNN) compared to a shallow network improves the results. Also, using the bidirectional network increases the accuracy of the model in comparison with the unidirectional network, in both deep and shallow networks. Comparing obtained results with the HMM and Kaldi-DNN indicates that using DBLSTM with features extracted from the DBN increases the accuracy of Persian phoneme recognition.

引用

页码：893 / 905

页数：12

共 50 条

[41] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Arash Dehghani
Seyyed Ali Seyyedsalehi
Neural Processing Letters, 2023, 55 : 3205 - 3224
[42] A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning
Ottoni, Lara Toledo Cordeiro
Ottoni, Andre Luiz Carvalho
Cerqueira, Jes de Jesus Fiais
ELECTRONICS, 2023, 12 (23)
[43] Arabic Speech Recognition with Deep Learning: A Review
Algihab, Wajdan
Alawwad, Noura
Aldawish, Anfal
AlHumoud, Sarah
SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 15 - 31
[44] Deep Learning for Environmentally Robust Speech Recognition
Alhamada, A., I
Khalifa, O. O.
Abdalla, A. H.
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
[45] Emotion Recognition in Speech with Deep Learning Architectures
Erdal, Mehmet
Kaechele, Markus
Schwenker, Friedhelm
ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
[46] Research on a Deep Learning Method for Speech Recognition
Xiao, Jia
Xiaolin, Sun
IAENG International Journal of Computer Science, 2024, 51 (09) : 1272 - 1280
[47] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
Pham, Tuan D.
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
[48] Deep learning for Depression Recognition from Speech
Tian, Han
Zhu, Zhang
Jing, Xu
MOBILE NETWORKS & APPLICATIONS, 2023, 29 (4): : 1212 - 1227
[49] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
Khan, Waleed Akram
ul Qudous, Hamad
Farhan, Asma Ahmad
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
[50] Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
Sharan, Roneel, V
2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 139 - 142

← 1 2 3 4 5 →