Speech recognition based on unified model of acoustic and language aspects of speech

被引:0
作者
机构
[1] Kubo, Yotaro
[2] Ogawa, Atsunori
[3] Hori, Takaaki
[4] Nakamura, Atsushi
来源
| 1600年 / Nippon Telegraph and Telephone Corp.卷 / 11期
关键词
Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic speech recognition has been attracting a lot of attention recently and is considered an important technique to achieve natural interaction between humans and machines. However, recognizing spontaneous speech is still considered to be difficult owing to the wide variety of patterns in spontaneous speech. We have been researching ways to overcome this problem and have developed a method to express both the acoustic and linguistic aspects of speech recognizers in a unified representation by integrating powerful frameworks of deep learning and a weighted finite-state transducer. We evaluated the proposed method ill an experiment to recognize a lecture speech dataset, which is coilsidered as a spontaneous speech dataset, and confirmed that the proposed method is promising for recognizing spontaneous speech.
引用
收藏
相关论文
共 50 条
[21]   Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network [J].
Kuo, Jong-Yih ;
Chen, Zhao-Ming ;
Lin, Hui-Chi .
2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, :52-56
[22]   A Study of Kazakh Speech Recognition in Hiformer Model [J].
Mamyrbayev, Orken ;
Kurmetkan, Turdbek ;
Oralbekova, Dina ;
Zhumazhan, Nurdaulet .
RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 2145 :330-340
[23]   Deep Activation Mixture Model for Speech Recognition [J].
Wu, Chunyang ;
Gales, Mark J. F. .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1611-1615
[24]   Towards a Deep Speech Model for Romanian Language [J].
Panaite, Marilena ;
Ruseti, Stefan ;
Dascalu, Mihai ;
Trausan-Matu, Stefan .
2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, :416-419
[25]   Exploring Multimodal Data Approach in Natural Language Processing Based on Speech Recognition Algorithms [J].
Oleh, Basystiuk ;
Ihor, Farmaha ;
Zoriana, Rybchak .
2023 17TH INTERNATIONAL CONFERENCE ON THE EXPERIENCE OF DESIGNING AND APPLICATION OF CAD SYSTEMS, CADSM, 2023,
[26]   End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge [J].
Kimura, Naoki ;
Su, Zixiong ;
Saeki, Takaaki .
INTERSPEECH 2020, 2020, :1025-1026
[27]   Deep learning: from speech recognition to language and multimodal processing [J].
Deng, Li .
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
[28]   UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language [J].
Akhtar, Muhammad Zaheer ;
Jahangir, Rashid ;
Ain, QuratUl ;
Nauman, Muhammad Asif ;
Uddin, Mueen ;
Ullah, Syed Sajid .
DATA IN BRIEF, 2025, 60
[29]   Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review [J].
Dar, G. H. Mohmad ;
Delhibabu, Radhakrishnan .
IEEE ACCESS, 2024, 12 :151122-151152
[30]   Deep Learning Speech Synthesis Model for Word/Character-Level Recognition in the Tamil Language [J].
Rajendran, Sukumar ;
Raja, Kiruba Thangam ;
Nagarajan, G. ;
Dass, A. Stephen ;
Kumar, M. Sandeep ;
Jayagopal, Prabhu .
INTERNATIONAL JOURNAL OF E-COLLABORATION, 2023, 19 (04) :20-20