Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引：0

作者：

Fayan R. ^{[1
]}

Montajabi Z. ^{[2
]}

Gonsalves R. ^{[1
]}

机构：

[1] Avid, United States

[2] Avid Technology, United States

来源：

SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期

关键词：

ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;

D O I：

10.5594/JMI.2024/IPYX8877

中图分类号：

学科分类号：

摘要：

This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.

引用

页码：48 / 57

页数：9

共 50 条

[1] Machine Learning in Automatic Speech Recognition: A Survey
Padmanabhan, Jayashree
Premkumar, Melvin Jose Johnson
IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
[2] Applying Machine Learning Techniques for Speech Emotion Recognition
Tarunika, K.
Pradeeba, R. B.
Aruna, P.
2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
[3] Applying Machine Learning and Automatic Speech Recognition for Intelligent Evaluation of Coal Failure Probability under Uniaxial Compression
Wang, Honglei
Li, Zhenlei
Song, Dazhao
He, Xueqiu
Khan, Majid
MINERALS, 2022, 12 (12)
[4] Automatic Speech Recognition: A survey of deep learning techniques and approaches
Ahlawat, Harsh
Aggarwal, Naveen
Gupta, Deepti
International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
[5] Automatic speech recognition of Gujarati digits using wavelet coefficients in machine learning algorithms
Pandit P.
Bhatt S.
International Journal of Innovative Computing and Applications, 2023, 14 (04) : 191 - 200
[6] The Automatic Recognition of Sepedi Speech Emotions based on Machine Learning Algorithms
Manamela, Phuti J.
Manamela, Madimetja J.
Modipa, Thipe I.
Sefara, Tshepisho J.
Mokgonyane, Tumisho B.
2018 INTERNATIONAL CONFERENCE ON ADVANCES IN BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD), 2018,
[7] A Machine Learning Based System for the Automatic Evaluation of Aphasia Speech
Kohlschein, Christian
Schmitt, Maximilian
Schuller, Bjoern
Jeschke, Sabina
Werner, Cornelius J.
2017 IEEE 19TH INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATIONS AND SERVICES (HEALTHCOM), 2017,
[8] Speech emotion recognition for psychotherapy: an analysis of traditional machine learning and deep learning techniques
Shah, Nidhi
Sood, Kanika
Arora, Jayraj
2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 718 - 723
[9] Towards Automatic Assessment of Aphasia Speech Using Automatic Speech Recognition Techniques
Qin, Ying
Lee, Tan
Kong, Anthony Pak Hin
Law, Sam Po
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[10] Continual Learning in Automatic Speech Recognition
Sadhu, Samik
Hermansky, Hynek
INTERSPEECH 2020, 2020, : 1246 - 1250

← 1 2 3 4 5 →