Thinking about the present and future of the complex speech recognition

被引:0
作者
Vicsi, Klara [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Mediainformat, Lab Speech Acoust, Budapest, Hungary
来源
3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012) | 2012年
关键词
component; speech recognition; speech to text transformation system; multi-modal speech processing; multi-stream modelling; FEATURES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A critical point of the most cognitive info-communication systems is the state of the development of speech recognition technology. The paper gives a short introduction of the principles of this speech recognition technology today. It highlights the fact that these systems in the market are only speech-to-text transformers giving only a word chain at the output, where the speech prosody, speech emotion, speech style and more other information are not involved. Many uncertainties exist in this operational system. Some up to date research tendencies, mostly the parallel processing are introduced aiming to increase the efficiencies of the recognition. At the end, research agenda of META NET are shortly introduced for Multilingual Europe in 2020.
引用
收藏
页码:371 / 376
页数:6
相关论文
共 50 条
  • [41] TranslatAble: Giving Individuals with Complex Communication Needs a Voice through Speech and Gesture Recognition
    Moore, Meredith
    Panchanathan, Sethuraman
    ASSETS'16: PROCEEDINGS OF THE 18TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2016, : 321 - 322
  • [42] Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients
    Sen, Tjong Wan
    Trilaksono, Bambang Riyanto
    Arman, Arry Akhmad
    Mandala, Rila
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2009, 3 (02) : 123 - 134
  • [43] β-Masking MMSE Speech Enhancement for Speech Recognition
    You, Chang Huai
    Ma, Bin
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 341 - 345
  • [44] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
    Wang, Jisung
    Kim, Sangki
    Lee, Yeha
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
  • [45] Automatic speech recognition and speech variability: A review
    Benzeghiba, M.
    De Mori, R.
    Deroo, O.
    Dupont, S.
    Erbes, T.
    Jouvet, D.
    Fissore, L.
    Laface, P.
    Mertins, A.
    Ris, C.
    Rose, R.
    Tyagi, V.
    Wellekens, C.
    SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
  • [46] A novel channel estimate for noise robust speech recognition
    Vanderreydt, Geoffroy
    Demuynck, Kris
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [47] Depression Detection in Arabic Using Speech Language Recognition
    Alsharif, Zainab
    Elhag, Salma
    Alfakeh, Sulhi
    2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 61 - 66
  • [48] Confusion analysis in phoneme based speech recognition in Hindi
    Bhatt, Shobha
    Dev, Amita
    Jain, Anurag
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 4213 - 4238
  • [49] A Study on Automatic Recognition of Positive and Negative Emotions in Speech
    Pavaloi, I
    Ciobanu, A.
    Luca, M.
    Musca, E.
    Barbu, T.
    Ignat, Anca
    2014 18TH INTERNATIONAL CONFERENCE SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2014, : 221 - 224
  • [50] Evaluating deep learning architectures for Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    NEURAL NETWORKS, 2017, 92 : 60 - 68