Machine Learning Paradigms for Speech Recognition: An Overview

被引:225
作者
Deng, Li [1 ]
Li, Xiao [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期
关键词
Machine learning; speech recognition; supervised; unsupervised; discriminative; generative; dynamics; adaptive; Bayesian; deep learning; HIDDEN MARKOV-MODELS; DEEP NEURAL-NETWORKS; MINIMUM PHONE ERROR; BAYES-RISK; PARAMETER-ESTIMATION; MIXTURE OBSERVATIONS; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; MULTIPLE TASKS; CLASSIFICATION;
D O I
10.1109/TASL.2013.2244083
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
引用
收藏
页码:1060 / 1089
页数:30
相关论文
共 50 条
  • [21] Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
    Zhang, Zixing
    Geiger, Juergen
    Pohjalainen, Jouni
    Mousa, Amr El-Desoky
    Jin, Wenyu
    Schuller, Bjoern
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2018, 9 (05)
  • [22] Deep learning: an overview and main paradigms
    Golovko V.A.
    Optical Memory and Neural Networks, 2017, 26 (1) : 1 - 17
  • [23] Research on Depression Recognition Using Machine Learning from Speech
    Shi, Daimin
    Lu, Xiaoyong
    Liu, Yang
    Yuan, Jingyi
    Pan, Tao
    Li, Yanqin
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 52 - 56
  • [24] Hyperspectral Anomaly Detection Based on Machine Learning: An Overview
    Xu, Yichu
    Zhang, Lefei
    Du, Bo
    Zhang, Liangpei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 3351 - 3364
  • [25] A review on speech processing using machine learning paradigm
    Bhangale, Kishor Barasu
    Mohanaprasad, K.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 367 - 388
  • [26] Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach
    Habbash, Mansoor
    Mnasri, Sami
    Alghamdi, Mansoor
    Alrashidi, Malek
    Tarawneh, Ahmad S.
    Gumair, Abdullah
    Hassanat, Ahmad B.
    IEEE ACCESS, 2024, 12 : 37219 - 37230
  • [27] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    IEEE ACCESS, 2023, 11 : 116638 - 116649
  • [28] An Overview of Machine Learning Methods for Multiple Target Tracking
    Chong, Chee-Yee
    2021 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2021, : 182 - 190
  • [29] Speech recognition in a dialog system: from conventional to deep processing
    Becerra, Aldonso
    Ismael de la Rosa, J.
    Gonzalez, Efren
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (12) : 15875 - 15911
  • [30] Part of speech tagging: a systematic review of deep learning and machine learning approaches
    Chiche, Alebachew
    Yitagesu, Betselot
    JOURNAL OF BIG DATA, 2022, 9 (01)