Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [41] On the Use of Machine Learning Techniques Towards the Design of Cloud Based Automatic Code Clone Validation Tools
    Mostaeen, Golam
    Svajlenko, Jeffrey
    Roy, Banani
    Roy, Chanchal K.
    Schneider, Kevin
    2018 IEEE 18TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2018, : 155 - 164
  • [42] Neutrosophic speech recognition Algorithm for speech under stress by Machine learning
    Nagarajan D.
    Broumi S.
    Smarandache F.
    Neutrosophic Sets and Systems, 2023, 55 : 46 - 57
  • [43] Novel automatic scorpion-detection and -recognition system based on machine-learning techniques
    Giambelluca, Francisco L.
    Cappelletti, Marcelo A.
    Osio, Jorge R.
    Giambelluca, Luis A.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (02):
  • [44] Educational Courseware Evaluation Using Machine Learning Techniques
    Singh, Shaveen
    Lal, Sunil Pranit
    2013 IEEE CONFERENCE ON E-LEARNING, E-MANAGEMENT AND E-SERVICES (IC3E), 2013, : 73 - 78
  • [45] A Review of Current Machine Learning Techniques Used in Manufacturing Diagnosis
    Ademujimi, Toyosi Toriola
    Brundage, Michael P.
    Prabhu, Vittaldas V.
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: THE PATH TO INTELLIGENT, COLLABORATIVE AND SUSTAINABLE MANUFACTURING, 2017, 513 : 407 - 415
  • [46] Counterfactually Fair Automatic Speech Recognition
    Sari, Leda
    Hasegawa-Johnson, Mark
    Yoo, Chang D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3515 - 3525
  • [47] Methodologies for the evaluation of Speaker Diarization and Automatic Speech Recognition in the presence of overlapping speech
    Galibert, Olivier
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1130 - 1133
  • [48] Neuro-fuzzy filtering techniques for automatic speech recognition enhancement
    Poluzzi, R
    Arnone, L
    Savi, A
    Brescianini, M
    2003 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING, PROCEEDINGS: FROM CLASSICAL MEASUREMENT TO COMPUTING WITH PERCEPTIONS, 2003, : 255 - 258
  • [49] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [50] Universal access to communication and learning: The role of automatic speech recognition
    Wald M.
    Bain K.
    Universal Access in the Information Society, 2008, 6 (4) : 435 - 447