Speech recognition based on unified model of acoustic and language aspects of speech

被引:0
作者
机构
[1] Kubo, Yotaro
[2] Ogawa, Atsunori
[3] Hori, Takaaki
[4] Nakamura, Atsushi
来源
| 1600年 / Nippon Telegraph and Telephone Corp.卷 / 11期
关键词
Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic speech recognition has been attracting a lot of attention recently and is considered an important technique to achieve natural interaction between humans and machines. However, recognizing spontaneous speech is still considered to be difficult owing to the wide variety of patterns in spontaneous speech. We have been researching ways to overcome this problem and have developed a method to express both the acoustic and linguistic aspects of speech recognizers in a unified representation by integrating powerful frameworks of deep learning and a weighted finite-state transducer. We evaluated the proposed method ill an experiment to recognize a lecture speech dataset, which is coilsidered as a spontaneous speech dataset, and confirmed that the proposed method is promising for recognizing spontaneous speech.
引用
收藏
相关论文
共 50 条
[41]   TOWARDS AN ASR APPROACH USING ACOUSTIC AND LANGUAGE MODELS FOR SPEECH ENHANCEMENT [J].
Nayem, Khandokar Md ;
Williamson, Donald S. .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7123-7127
[42]   MULTI-CLASS AUTOMATED SPEECH LANGUAGE RECOGNITION USING NATURAL LANGUAGE PROCESSING WITH OPTIMAL DEEP LEARNING MODEL [J].
Al-anazi, Reema g. ;
Alqahtani, Hamed ;
Alzaidi, Muhammad swaileh a. ;
Alanazi, Meshari h. ;
AL Sultan, Hanan ;
Alrowaily, Amal f. ;
Aljabri, Jawhara ;
Alqudah, Assal .
FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2025, 33 (02)
[43]   Depression recognition using a proposed speech chain model fusing speech production and perception features [J].
Du, Minghao ;
Liu, Shuang ;
Wang, Tao ;
Zhang, Wenquan ;
Ke, Yufeng ;
Chen, Long ;
Ming, Dong .
JOURNAL OF AFFECTIVE DISORDERS, 2023, 323 :299-308
[44]   Semi-supervised Model for Emotion Recognition in Speech [J].
Pereira, Ingryd ;
Santos, Diego ;
Maciel, Alexandre ;
Barros, Pablo .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 :791-800
[45]   Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques [J].
Nayak, Subrat Kumar ;
Nayak, Ajit Kumar ;
Mishra, Smitaprava ;
Mohanty, Prithviraj ;
Tripathy, Nrusingha ;
Chaudhury, Kumar Surjeet .
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2025, 16 (01) :53-64
[46]   On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition [J].
Fayek, Haytham M. ;
Lech, Margaret ;
Cavedon, Lawrence .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3618-3622
[47]   TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL [J].
Alahmari, Saad ;
Al-shathry, Najla i. ;
Eltahir, Majdy m. ;
Alzaidi, Muhammad swaileh a. ;
Alghamdi, Ayman ahmad ;
Mahmud, Ahmed .
FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2025, 33 (02)
[48]   Acoustic and Language Based Deep Learning Approaches for Alzheimer's Dementia Detection From Spontaneous Speech [J].
Mahajan, Pranav ;
Baths, Veeky .
FRONTIERS IN AGING NEUROSCIENCE, 2021, 13
[49]   Directed Speech Separation for Automatic Speech Recognition of Long-form Conversational Speech [J].
Paturi, Rohit ;
Srinivasan, Sundararajan ;
Kirchhoff, Katrin ;
Romero, Daniel Garcia .
INTERSPEECH 2022, 2022, :5388-5392
[50]   AI-based Arabic Language and Speech Tutor [J].
Shao, Sicong ;
Alharir, Saleem ;
Hariri, Salim ;
Satam, Pratik ;
Shiri, Sonia ;
Mbarki, Abdessamad .
2022 IEEE/ACS 19TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2022,