Machine Learning Paradigms for Speech Recognition: An Overview

被引:225
作者
Deng, Li [1 ]
Li, Xiao [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期
关键词
Machine learning; speech recognition; supervised; unsupervised; discriminative; generative; dynamics; adaptive; Bayesian; deep learning; HIDDEN MARKOV-MODELS; DEEP NEURAL-NETWORKS; MINIMUM PHONE ERROR; BAYES-RISK; PARAMETER-ESTIMATION; MIXTURE OBSERVATIONS; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; MULTIPLE TASKS; CLASSIFICATION;
D O I
10.1109/TASL.2013.2244083
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
引用
收藏
页码:1060 / 1089
页数:30
相关论文
共 50 条
  • [31] Deep Learning and Machine Learning for Malaria Detection: Overview, Challenges and Future Directions
    Jdey, Imen
    Hcini, Hazala
    Ltifi, Hela
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2024, 23 (05) : 1745 - 1776
  • [32] Speech Recognition using Deep Learning
    Lakkhanawannakun, Phoemporn
    Noyunsan, Chaluemwut
    2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
  • [33] Augmenting machine learning for Amharic speech recognition: a paradigm of patient's lips motion detection
    Birara, Muluken
    Gebremeskel, Gebeyehu Belay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24377 - 24397
  • [34] Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection
    Muluken Birara
    Gebeyehu Belay Gebremeskel
    Multimedia Tools and Applications, 2022, 81 : 24377 - 24397
  • [35] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [36] Dual supervised learning for non-native speech recognition
    Radzikowski, Kacper
    Nowak, Robert
    Wang, Le
    Yoshie, Osamu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
  • [37] Dual supervised learning for non-native speech recognition
    Kacper Radzikowski
    Robert Nowak
    Le Wang
    Osamu Yoshie
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [38] A Comprehensive Review on Machine Learning Approaches for Enhancing Human Speech Recognition
    Shanshool, Maha Adnan
    Abdulmohsin, Husam Ali
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 2121 - 2129
  • [39] Speech emotion recognition method in educational scene based on machine learning
    Zhang, Yanning
    Srivastava, Gautam
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (05)
  • [40] Research on the Application of Machine Learning in the Field of Speech Recognition and Path Planning
    , Gary
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933