Persian speech recognition using deep learning

被引:1
|
作者
Hadi Veisi
Armita Haji Mani
机构
[1] University of Tehran,Faculty of New Sciences and Technologies (FNST)
来源
International Journal of Speech Technology | 2020年 / 23卷
关键词
Persian speech recognition; Bidirectional long short-term memory neural network; Deep neural network; Deep belief network;
D O I
暂无
中图分类号
学科分类号
摘要
Up to now, various methods are used for Automatic Speech Recognition (ASR), and among which the Hidden Markov Model (HMM) and Artificial Neural Networks (ANNs) are the most important ones. One of the existing challenges is increasing the accuracy and efficiency of these systems. One way to enhance the accuracy of them is by improving the acoustic model (AM). In this paper, for the first time, the combination of deep belief network (DBN), for extracting features of speech signals, and Deep Bidirectional Long Short-Term Memory (DBLSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an AM on the Farsdat Persian speech data set. The obtained results show that the use of a deep neural network (DNN) compared to a shallow network improves the results. Also, using the bidirectional network increases the accuracy of the model in comparison with the unidirectional network, in both deep and shallow networks. Comparing obtained results with the HMM and Kaldi-DNN indicates that using DBLSTM with features extracted from the DBN increases the accuracy of Persian phoneme recognition.
引用
收藏
页码:893 / 905
页数:12
相关论文
共 50 条
  • [31] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
    Satt, Aharon
    Rozenberg, Shai
    Hoory, Ron
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
  • [32] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
    Byun, Sung-woo
    Shin, Bo-ra
    Lee, Seok-Pil
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
  • [33] Cross-language Transfer Speech Recognition using Deep Learning
    Zhao, Yue
    Xu, Yan M.
    Sun, Mei J.
    Xu, Xiao N.
    Wang, Hui
    Yang, Guo S.
    Ji, Qiang
    11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2014, : 1422 - 1426
  • [34] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [35] Automatic speech recognition using advanced deep learning approaches: A survey
    Kheddar, Hamza
    Hemis, Mustapha
    Himeur, Yassine
    INFORMATION FUSION, 2024, 109
  • [36] Emotion recognition of audio/speech data using deep learning approaches
    Gupta, Vedika
    Juyal, Stuti
    Singh, Gurvinder Pal
    Killa, Chirag
    Gupta, Nishant
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06): : 1309 - 1317
  • [37] English Speech Recognition and Evaluation of Pronunciation Quality Using Deep Learning
    Xu, Yushu
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [38] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [39] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    Chmulik, Michal
    ELECTRONICS, 2021, 10 (10)
  • [40] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
    Dehghani, Arash
    Seyyedsalehi, Seyyed Ali
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3205 - 3224