Persian speech recognition using deep learning

被引：1

作者：

Hadi Veisi

Armita Haji Mani

机构：

[1] University of Tehran,Faculty of New Sciences and Technologies (FNST)

来源：

International Journal of Speech Technology | 2020年 / 23卷

关键词：

Persian speech recognition; Bidirectional long short-term memory neural network; Deep neural network; Deep belief network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Up to now, various methods are used for Automatic Speech Recognition (ASR), and among which the Hidden Markov Model (HMM) and Artificial Neural Networks (ANNs) are the most important ones. One of the existing challenges is increasing the accuracy and efficiency of these systems. One way to enhance the accuracy of them is by improving the acoustic model (AM). In this paper, for the first time, the combination of deep belief network (DBN), for extracting features of speech signals, and Deep Bidirectional Long Short-Term Memory (DBLSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an AM on the Farsdat Persian speech data set. The obtained results show that the use of a deep neural network (DNN) compared to a shallow network improves the results. Also, using the bidirectional network increases the accuracy of the model in comparison with the unidirectional network, in both deep and shallow networks. Comparing obtained results with the HMM and Kaldi-DNN indicates that using DBLSTM with features extracted from the DBN increases the accuracy of Persian phoneme recognition.

引用

页码：893 / 905

页数：12

共 50 条

[31] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
Satt, Aharon
Rozenberg, Shai
Hoory, Ron
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
[32] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
Byun, Sung-woo
Shin, Bo-ra
Lee, Seok-Pil
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
[33] Cross-language Transfer Speech Recognition using Deep Learning
Zhao, Yue
Xu, Yan M.
Sun, Mei J.
Xu, Xiao N.
Wang, Hui
Yang, Guo S.
Ji, Qiang
11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2014, : 1422 - 1426
[34] Active Learning for Speech Emotion Recognition Using Deep Neural Network
Abdelwahab, Mohammed
Busso, Carlos
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[35] Automatic speech recognition using advanced deep learning approaches: A survey
Kheddar, Hamza
Hemis, Mustapha
Himeur, Yassine
INFORMATION FUSION, 2024, 109
[36] Emotion recognition of audio/speech data using deep learning approaches
Gupta, Vedika
Juyal, Stuti
Singh, Gurvinder Pal
Killa, Chirag
Gupta, Nishant
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06): : 1309 - 1317
[37] English Speech Recognition and Evaluation of Pronunciation Quality Using Deep Learning
Xu, Yushu
MOBILE INFORMATION SYSTEMS, 2022, 2022
[38] Low-resource Sinhala Speech Recognition using Deep Learning
Karunathilaka, Hirunika
Welgama, Viraj
Nadungodage, Thilini
Weerasinghe, Ruvan
2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
[39] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
Lieskovska, Eva
Jakubec, Maros
Jarina, Roman
Chmulik, Michal
ELECTRONICS, 2021, 10 (10)
[40] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Dehghani, Arash
Seyyedsalehi, Seyyed Ali
NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3205 - 3224

← 1 2 3 4 5 →