Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引：0

作者：

Fayan R. ^{[1
]}

Montajabi Z. ^{[2
]}

Gonsalves R. ^{[1
]}

机构：

[1] Avid, United States

[2] Avid Technology, United States

来源：

SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期

关键词：

ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;

D O I：

10.5594/JMI.2024/IPYX8877

中图分类号：

学科分类号：

摘要：

This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.

引用

页码：48 / 57

页数：9

共 50 条

[31] Machine Learning Approach for Emotion Recognition in Speech
Gjoreski, Martin
Gjoreski, Hristijan
Kulakov, Andrea
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
[32] Automatic recognition of automobiles using machine learning
Martinez-Camacho, Deborah G.
Torres-Cisneros, Miguel
May-Arrioja, Daniel A.
Pena-Gomar, Mary-Carmen
Guzman-Cabrera, Rafael
DYNA, 2023, 98 (05): : 511 - 516
[33] Machine Learning Techniques for Automatic Depression Assessment
Maridaki, Anna
Pampouchidou, Anastasia
Marias, Kostas
Tsiknakis, Manolis
2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2018, : 433 - +
[34] Channel normalization techniques for automatic speech recognition over the telephone
de Veth, J
Boves, L
SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
[35] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
Zhang, Wei
Cui, Xiaodong
Finkler, Ulrich
Kingsbury, Brian
Saon, George
Kung, David
Picheny, Michael
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
[36] Active learning:: Theory and applications to automatic speech recognition
Riccardi, G
Hakkani-Tür, D
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04): : 504 - 511
[37] Unsupervised Online Continual Learning for Automatic Speech Recognition
Vander Eeckt, Steven
Van Hamme, Hugo
INTERSPEECH 2024, 2024, : 2845 - 2849
[38] ACTIVE LEARNING FOR ACCENT ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
Nallasamy, Udhyakumar
Metze, Florian
Schultz, Tanja
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 360 - 365
[39] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
Bjorklund, Johanna
Cleophas, Loek
Karlsson, My
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034
[40] Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor
Ravenscroft, Dafydd
Prattis, Ioannis
Kandukuri, Tharun
Samad, Yarjan Abdul
Mallia, Giorgio
Occhipinti, Luigi G.
SENSORS, 2022, 22 (01)

← 1 2 3 4 5 →