Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [31] Machine Learning Approach for Emotion Recognition in Speech
    Gjoreski, Martin
    Gjoreski, Hristijan
    Kulakov, Andrea
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
  • [32] Automatic recognition of automobiles using machine learning
    Martinez-Camacho, Deborah G.
    Torres-Cisneros, Miguel
    May-Arrioja, Daniel A.
    Pena-Gomar, Mary-Carmen
    Guzman-Cabrera, Rafael
    DYNA, 2023, 98 (05): : 511 - 516
  • [33] Machine Learning Techniques for Automatic Depression Assessment
    Maridaki, Anna
    Pampouchidou, Anastasia
    Marias, Kostas
    Tsiknakis, Manolis
    2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2018, : 433 - +
  • [34] Channel normalization techniques for automatic speech recognition over the telephone
    de Veth, J
    Boves, L
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
  • [35] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Kingsbury, Brian
    Saon, George
    Kung, David
    Picheny, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
  • [36] Active learning:: Theory and applications to automatic speech recognition
    Riccardi, G
    Hakkani-Tür, D
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04): : 504 - 511
  • [37] Unsupervised Online Continual Learning for Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    INTERSPEECH 2024, 2024, : 2845 - 2849
  • [38] ACTIVE LEARNING FOR ACCENT ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
    Nallasamy, Udhyakumar
    Metze, Florian
    Schultz, Tanja
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 360 - 365
  • [39] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
    Bjorklund, Johanna
    Cleophas, Loek
    Karlsson, My
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034
  • [40] Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor
    Ravenscroft, Dafydd
    Prattis, Ioannis
    Kandukuri, Tharun
    Samad, Yarjan Abdul
    Mallia, Giorgio
    Occhipinti, Luigi G.
    SENSORS, 2022, 22 (01)