Machine Learning Paradigms for Speech Recognition: An Overview

被引：225

作者：

Deng, Li ^{[1
]}

Li, Xiao ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 05期

关键词：

Machine learning; speech recognition; supervised; unsupervised; discriminative; generative; dynamics; adaptive; Bayesian; deep learning; HIDDEN MARKOV-MODELS; DEEP NEURAL-NETWORKS; MINIMUM PHONE ERROR; BAYES-RISK; PARAMETER-ESTIMATION; MIXTURE OBSERVATIONS; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; MULTIPLE TASKS; CLASSIFICATION;

D O I：

10.1109/TASL.2013.2244083

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem-for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.

引用

页码：1060 / 1089

页数：30

共 50 条

[41] Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools
Fayan R.
Montajabi Z.
Gonsalves R.
SMPTE Motion Imaging Journal, 2024, 133 (02): : 48 - 57
[42] Machine Learning Applied to Speech Recordings for Parkinson's Disease Recognition
Aversano, Lerina
Bernardi, Mario L.
Cimitile, Marta
Iammarino, Martina
Madau, Antonella
Verdone, Chiara
DEEP LEARNING THEORY AND APPLICATIONS, DELTA 2023, 2023, 1875 : 101 - 114
[43] HARTH: A Human Activity Recognition Dataset for Machine Learning
Logacjov, Aleksej
Bach, Kerstin
Kongsvold, Atle
Bardstu, Hilde Bremseth
Mork, Paul Jarle
SENSORS, 2021, 21 (23)
[44] An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Dey, Spandan
Sahidullah, Md
Saha, Goutam
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (06)
[45] Survey of Deep Learning Paradigms for Speech Processing
Bhangale, Kishor Barasu
Kothandaraman, Mohanaprasad
WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) : 1913 - 1949
[46] Survey of Deep Learning Paradigms for Speech Processing
Kishor Barasu Bhangale
Mohanaprasad Kothandaraman
Wireless Personal Communications, 2022, 125 : 1913 - 1949
[47] On the In Vivo Recognition of Kidney Stones Using Machine Learning
Lopez-Tiro, Francisco
Estrade, Vincent
Hubert, Jacques
Flores-Araiza, Daniel
Gonzalez-Mendoza, Miguel
Ochoa-Ruiz, Gilberto
Daul, Christian
IEEE ACCESS, 2024, 12 : 10736 - 10759
[48] Learning Speech Rate in Speech Recognition
Zeng, Xiangyu
Yin, Shi
Wang, Dong
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 528 - 532
[49] Taxonomy of machine learning paradigms: A data-centric perspective
Emmert-Streib, Frank
Dehmer, Matthias
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (05)
[50] A Comparison of Machine Learning Approaches for Detecting Misogynistic Speech in Urban Dictionary
Lynn, Theo
Endo, Patricia Takako
Rosati, Pierangelo
Silva, Ivanovitch
Santos, Guto Leoni
Ging, Debbie
2019 INTERNATIONAL CONFERENCE ON CYBER SITUATIONAL AWARENESS, DATA ANALYTICS AND ASSESSMENT (CYBER SA), 2019,

← 1 2 3 4 5 →