Thinking about the present and future of the complex speech recognition

被引：0

作者：

Vicsi, Klara ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Mediainformat, Lab Speech Acoust, Budapest, Hungary

来源：

3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012) | 2012年

关键词：

component; speech recognition; speech to text transformation system; multi-modal speech processing; multi-stream modelling; FEATURES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A critical point of the most cognitive info-communication systems is the state of the development of speech recognition technology. The paper gives a short introduction of the principles of this speech recognition technology today. It highlights the fact that these systems in the market are only speech-to-text transformers giving only a word chain at the output, where the speech prosody, speech emotion, speech style and more other information are not involved. Many uncertainties exist in this operational system. Some up to date research tendencies, mostly the parallel processing are introduced aiming to increase the efficiencies of the recognition. At the end, research agenda of META NET are shortly introduced for Multilingual Europe in 2020.

引用

页码：371 / 376

页数：6

共 50 条

[41] TranslatAble: Giving Individuals with Complex Communication Needs a Voice through Speech and Gesture Recognition
Moore, Meredith
Panchanathan, Sethuraman
ASSETS'16: PROCEEDINGS OF THE 18TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2016, : 321 - 322
[42] Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients
Sen, Tjong Wan
Trilaksono, Bambang Riyanto
Arman, Arry Akhmad
Mandala, Rila
JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2009, 3 (02) : 123 - 134
[43] β-Masking MMSE Speech Enhancement for Speech Recognition
You, Chang Huai
Ma, Bin
2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 341 - 345
[44] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
Wang, Jisung
Kim, Sangki
Lee, Yeha
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
[45] Automatic speech recognition and speech variability: A review
Benzeghiba, M.
De Mori, R.
Deroo, O.
Dupont, S.
Erbes, T.
Jouvet, D.
Fissore, L.
Laface, P.
Mertins, A.
Ris, C.
Rose, R.
Tyagi, V.
Wellekens, C.
SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
[46] A novel channel estimate for noise robust speech recognition
Vanderreydt, Geoffroy
Demuynck, Kris
COMPUTER SPEECH AND LANGUAGE, 2024, 86
[47] Depression Detection in Arabic Using Speech Language Recognition
Alsharif, Zainab
Elhag, Salma
Alfakeh, Sulhi
2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 61 - 66
[48] Confusion analysis in phoneme based speech recognition in Hindi
Bhatt, Shobha
Dev, Amita
Jain, Anurag
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 4213 - 4238
[49] A Study on Automatic Recognition of Positive and Negative Emotions in Speech
Pavaloi, I
Ciobanu, A.
Luca, M.
Musca, E.
Barbu, T.
Ignat, Anca
2014 18TH INTERNATIONAL CONFERENCE SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2014, : 221 - 224
[50] Evaluating deep learning architectures for Speech Emotion Recognition
Fayek, Haytham M.
Lech, Margaret
Cavedon, Lawrence
NEURAL NETWORKS, 2017, 92 : 60 - 68

← 1 2 3 4 5 →