Machine Learning Paradigms for Speech Recognition: An Overview

被引:225
作者
Deng, Li [1 ]
Li, Xiao [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期
关键词
Machine learning; speech recognition; supervised; unsupervised; discriminative; generative; dynamics; adaptive; Bayesian; deep learning; HIDDEN MARKOV-MODELS; DEEP NEURAL-NETWORKS; MINIMUM PHONE ERROR; BAYES-RISK; PARAMETER-ESTIMATION; MIXTURE OBSERVATIONS; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; MULTIPLE TASKS; CLASSIFICATION;
D O I
10.1109/TASL.2013.2244083
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
引用
收藏
页码:1060 / 1089
页数:30
相关论文
共 50 条
  • [1] Speech recognition using machine learning
    Vashisht V.
    Pandey A.K.
    Yadav S.P.
    IEIE Transactions on Smart Processing and Computing, 2021, 10 (03) : 233 - 239
  • [2] Machine Learning in Automatic Speech Recognition: A Survey
    Padmanabhan, Jayashree
    Premkumar, Melvin Jose Johnson
    IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
  • [3] Neutrosophic speech recognition Algorithm for speech under stress by Machine learning
    Nagarajan D.
    Broumi S.
    Smarandache F.
    Neutrosophic Sets and Systems, 2023, 55 : 46 - 57
  • [4] IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients
    Olatinwo, Damilola D.
    Abu-Mahfouz, Adnan
    Hancke, Gerhard
    Myburgh, Hermanus
    SENSORS, 2023, 23 (06)
  • [5] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
    Yu, Chongchong
    Kang, Meng
    Chen, Yunbing
    Wu, Jiajia
    Zhao, Xia
    IEEE ACCESS, 2020, 8 : 163829 - 163843
  • [6] Speech emotion recognition using machine learning - A systematic review
    Madanian, Samaneh
    Chen, Talen
    Adeleye, Olayinka
    Templeton, John Michael
    Poellabauer, Christian
    Parry, Dave
    Schneidere, Sandra L.
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 20
  • [7] Machine Learning Approach for Emotion Recognition in Speech
    Gjoreski, Martin
    Gjoreski, Hristijan
    Kulakov, Andrea
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
  • [8] A Spoken English Teaching System Based on Speech Recognition and Machine Learning
    Jiao, Fengming
    Song, Jiao
    Zhao, Xin
    Zhao, Ping
    Wang, Ru
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2021, 16 (14) : 68 - 82
  • [9] The Automatic Recognition of Sepedi Speech Emotions based on Machine Learning Algorithms
    Manamela, Phuti J.
    Manamela, Madimetja J.
    Modipa, Thipe I.
    Sefara, Tshepisho J.
    Mokgonyane, Tumisho B.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD), 2018,
  • [10] Speech emotion recognition for psychotherapy: an analysis of traditional machine learning and deep learning techniques
    Shah, Nidhi
    Sood, Kanika
    Arora, Jayraj
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 718 - 723