Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [1] Machine Learning in Automatic Speech Recognition: A Survey
    Padmanabhan, Jayashree
    Premkumar, Melvin Jose Johnson
    IETE TECHNICAL REVIEW, 2015, 32 (04) : 240 - 251
  • [2] Applying Machine Learning Techniques for Speech Emotion Recognition
    Tarunika, K.
    Pradeeba, R. B.
    Aruna, P.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [3] Applying Machine Learning and Automatic Speech Recognition for Intelligent Evaluation of Coal Failure Probability under Uniaxial Compression
    Wang, Honglei
    Li, Zhenlei
    Song, Dazhao
    He, Xueqiu
    Khan, Majid
    MINERALS, 2022, 12 (12)
  • [4] Automatic Speech Recognition: A survey of deep learning techniques and approaches
    Ahlawat, Harsh
    Aggarwal, Naveen
    Gupta, Deepti
    International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
  • [6] The Automatic Recognition of Sepedi Speech Emotions based on Machine Learning Algorithms
    Manamela, Phuti J.
    Manamela, Madimetja J.
    Modipa, Thipe I.
    Sefara, Tshepisho J.
    Mokgonyane, Tumisho B.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD), 2018,
  • [7] A Machine Learning Based System for the Automatic Evaluation of Aphasia Speech
    Kohlschein, Christian
    Schmitt, Maximilian
    Schuller, Bjoern
    Jeschke, Sabina
    Werner, Cornelius J.
    2017 IEEE 19TH INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATIONS AND SERVICES (HEALTHCOM), 2017,
  • [8] Speech emotion recognition for psychotherapy: an analysis of traditional machine learning and deep learning techniques
    Shah, Nidhi
    Sood, Kanika
    Arora, Jayraj
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 718 - 723
  • [9] Towards Automatic Assessment of Aphasia Speech Using Automatic Speech Recognition Techniques
    Qin, Ying
    Lee, Tan
    Kong, Anthony Pak Hin
    Law, Sam Po
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [10] Continual Learning in Automatic Speech Recognition
    Sadhu, Samik
    Hermansky, Hynek
    INTERSPEECH 2020, 2020, : 1246 - 1250