Persian speech recognition using deep learning

被引:1
|
作者
Hadi Veisi
Armita Haji Mani
机构
[1] University of Tehran,Faculty of New Sciences and Technologies (FNST)
来源
International Journal of Speech Technology | 2020年 / 23卷
关键词
Persian speech recognition; Bidirectional long short-term memory neural network; Deep neural network; Deep belief network;
D O I
暂无
中图分类号
学科分类号
摘要
Up to now, various methods are used for Automatic Speech Recognition (ASR), and among which the Hidden Markov Model (HMM) and Artificial Neural Networks (ANNs) are the most important ones. One of the existing challenges is increasing the accuracy and efficiency of these systems. One way to enhance the accuracy of them is by improving the acoustic model (AM). In this paper, for the first time, the combination of deep belief network (DBN), for extracting features of speech signals, and Deep Bidirectional Long Short-Term Memory (DBLSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an AM on the Farsdat Persian speech data set. The obtained results show that the use of a deep neural network (DNN) compared to a shallow network improves the results. Also, using the bidirectional network increases the accuracy of the model in comparison with the unidirectional network, in both deep and shallow networks. Comparing obtained results with the HMM and Kaldi-DNN indicates that using DBLSTM with features extracted from the DBN increases the accuracy of Persian phoneme recognition.
引用
收藏
页码:893 / 905
页数:12
相关论文
共 50 条
  • [41] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
    Arash Dehghani
    Seyyed Ali Seyyedsalehi
    Neural Processing Letters, 2023, 55 : 3205 - 3224
  • [42] A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning
    Ottoni, Lara Toledo Cordeiro
    Ottoni, Andre Luiz Carvalho
    Cerqueira, Jes de Jesus Fiais
    ELECTRONICS, 2023, 12 (23)
  • [43] Arabic Speech Recognition with Deep Learning: A Review
    Algihab, Wajdan
    Alawwad, Noura
    Aldawish, Anfal
    AlHumoud, Sarah
    SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 15 - 31
  • [44] Deep Learning for Environmentally Robust Speech Recognition
    Alhamada, A., I
    Khalifa, O. O.
    Abdalla, A. H.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
  • [45] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [46] Research on a Deep Learning Method for Speech Recognition
    Xiao, Jia
    Xiaolin, Sun
    IAENG International Journal of Computer Science, 2024, 51 (09) : 1272 - 1280
  • [47] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
    Pham, Tuan D.
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
  • [48] Deep learning for Depression Recognition from Speech
    Tian, Han
    Zhu, Zhang
    Jing, Xu
    MOBILE NETWORKS & APPLICATIONS, 2023, 29 (4): : 1212 - 1227
  • [49] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
    Khan, Waleed Akram
    ul Qudous, Hamad
    Farhan, Asma Ahmad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
  • [50] Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
    Sharan, Roneel, V
    2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 139 - 142