Speech recognition based on unified model of acoustic and language aspects of speech

被引:0
作者
机构
[1] Kubo, Yotaro
[2] Ogawa, Atsunori
[3] Hori, Takaaki
[4] Nakamura, Atsushi
来源
| 1600年 / Nippon Telegraph and Telephone Corp.卷 / 11期
关键词
Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic speech recognition has been attracting a lot of attention recently and is considered an important technique to achieve natural interaction between humans and machines. However, recognizing spontaneous speech is still considered to be difficult owing to the wide variety of patterns in spontaneous speech. We have been researching ways to overcome this problem and have developed a method to express both the acoustic and linguistic aspects of speech recognizers in a unified representation by integrating powerful frameworks of deep learning and a weighted finite-state transducer. We evaluated the proposed method ill an experiment to recognize a lecture speech dataset, which is coilsidered as a spontaneous speech dataset, and confirmed that the proposed method is promising for recognizing spontaneous speech.
引用
收藏
相关论文
共 50 条
[31]   Speech recognition model design for Sundanese language using WAV2VEC 2.0 [J].
Cryssiover A. ;
Zahra A. .
International Journal of Speech Technology, 2024, 27 (01) :171-177
[32]   An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition [J].
Wu, Bo ;
Li, Kehuang ;
Ge, Fengpei ;
Huang, Zhen ;
Yang, Minglei ;
Siniscalchi, Sabato Marco ;
Lee, Chin-Hui .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) :1289-1300
[33]   Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation [J].
Thirafi, Faza ;
Lestari, Dessi Puji .
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, :203-208
[34]   FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES [J].
Pepino, Leonardo ;
Riera, Pablo ;
Ferrer, Luciana ;
Gravano, Agustin .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :6484-6488
[35]   Speech emotion recognition model based on Bi-GRU and Focal Loss [J].
Zhu, Zijiang ;
Dai, Weihuang ;
Hu, Yi ;
Li, Junshan .
PATTERN RECOGNITION LETTERS, 2020, 140 :358-365
[36]   Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview - [J].
Gavat, Inge ;
Militaru, Diana .
2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
[37]   Acoustic backdoor attacks on speech recognition via frequency offset perturbation [J].
Tang, Yu ;
Xu, Xiaolong ;
Sun, Lijuan .
APPLIED SOFT COMPUTING, 2025, 177
[38]   A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering [J].
Zhang, Li-Min ;
Ng, Giap Weng ;
Leau, Yu-Beng ;
Yan, Hao .
IEEE ACCESS, 2023, 11 :71224-71234
[39]   Speech Emotion Recognition via Sparse Learning-Based Fusion Model [J].
Min, Dong-Jin ;
Kim, Deok-Hwan .
IEEE ACCESS, 2024, 12 :177219-177235
[40]   A study on emotion recognition using speech acoustic features and face images [J].
Son M.-J. ;
Lee S.-P. .
Trans. Korean Inst. Electr. Eng., 2020, 7 (1081-1086) :1081-1086