Machine Learning Paradigms for Speech Recognition: An Overview

被引:225
作者
Deng, Li [1 ]
Li, Xiao [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期
关键词
Machine learning; speech recognition; supervised; unsupervised; discriminative; generative; dynamics; adaptive; Bayesian; deep learning; HIDDEN MARKOV-MODELS; DEEP NEURAL-NETWORKS; MINIMUM PHONE ERROR; BAYES-RISK; PARAMETER-ESTIMATION; MIXTURE OBSERVATIONS; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; MULTIPLE TASKS; CLASSIFICATION;
D O I
10.1109/TASL.2013.2244083
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
引用
收藏
页码:1060 / 1089
页数:30
相关论文
共 50 条
  • [41] Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools
    Fayan R.
    Montajabi Z.
    Gonsalves R.
    SMPTE Motion Imaging Journal, 2024, 133 (02): : 48 - 57
  • [42] Machine Learning Applied to Speech Recordings for Parkinson's Disease Recognition
    Aversano, Lerina
    Bernardi, Mario L.
    Cimitile, Marta
    Iammarino, Martina
    Madau, Antonella
    Verdone, Chiara
    DEEP LEARNING THEORY AND APPLICATIONS, DELTA 2023, 2023, 1875 : 101 - 114
  • [43] HARTH: A Human Activity Recognition Dataset for Machine Learning
    Logacjov, Aleksej
    Bach, Kerstin
    Kongsvold, Atle
    Bardstu, Hilde Bremseth
    Mork, Paul Jarle
    SENSORS, 2021, 21 (23)
  • [44] An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
    Dey, Spandan
    Sahidullah, Md
    Saha, Goutam
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (06)
  • [45] Survey of Deep Learning Paradigms for Speech Processing
    Bhangale, Kishor Barasu
    Kothandaraman, Mohanaprasad
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) : 1913 - 1949
  • [46] Survey of Deep Learning Paradigms for Speech Processing
    Kishor Barasu Bhangale
    Mohanaprasad Kothandaraman
    Wireless Personal Communications, 2022, 125 : 1913 - 1949
  • [47] On the In Vivo Recognition of Kidney Stones Using Machine Learning
    Lopez-Tiro, Francisco
    Estrade, Vincent
    Hubert, Jacques
    Flores-Araiza, Daniel
    Gonzalez-Mendoza, Miguel
    Ochoa-Ruiz, Gilberto
    Daul, Christian
    IEEE ACCESS, 2024, 12 : 10736 - 10759
  • [48] Learning Speech Rate in Speech Recognition
    Zeng, Xiangyu
    Yin, Shi
    Wang, Dong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 528 - 532
  • [49] Taxonomy of machine learning paradigms: A data-centric perspective
    Emmert-Streib, Frank
    Dehmer, Matthias
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (05)
  • [50] A Comparison of Machine Learning Approaches for Detecting Misogynistic Speech in Urban Dictionary
    Lynn, Theo
    Endo, Patricia Takako
    Rosati, Pierangelo
    Silva, Ivanovitch
    Santos, Guto Leoni
    Ging, Debbie
    2019 INTERNATIONAL CONFERENCE ON CYBER SITUATIONAL AWARENESS, DATA ANALYTICS AND ASSESSMENT (CYBER SA), 2019,