Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

被引:0
作者
Lin, Yist Y. [1 ]
Han, Tao [1 ]
Xu, Haihua [1 ]
Van Tung Pham [1 ]
Khassanov, Yerbolat [1 ]
Chong, Tze Yuang [1 ]
He, Yi [1 ]
Lu, Lu [1 ]
Ma, Zejun [1 ]
机构
[1] ByteDance, Beijing, Peoples R China
来源
INTERSPEECH 2023 | 2023年
关键词
random utterance concatenation; data augmentation; short video; end-to-end; speech recognition;
D O I
10.21437/Interspeech.2023-1272
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observations that our human-transcribed training utterances tend to be much shorter for short-video spontaneous speech (similar to 3 seconds on average), while our test utterance generated from voice activity detection front-end is much longer (similar to 10 seconds on average). Such a mismatch can lead to suboptimal performance. Empirically, it's observed the proposed RUC method significantly improves long utterance recognition without performance drop on short one. Overall, it achieves 5.72% word error rate reduction on average for 15 languages and improved robustness to various utterance length.
引用
收藏
页码:904 / 908
页数:5
相关论文
共 50 条
  • [31] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Deng, Jiajun
    Wang, Tianzi
    Hu, Shujie
    Li, Guinan
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429
  • [32] A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems
    Manuel Ramirez, Jose
    Montalvo, Ana
    Ramon Calvo, Jose
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 669 - 678
  • [33] Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
    Tao, Huawei
    Shan, Shuai
    Hu, Ziyi
    Zhu, Chunhua
    Ge, Hongyi
    ENTROPY, 2023, 25 (01)
  • [34] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [35] Speech recognition and utterance verification based on a generalized confidence score
    Koo, MW
    Lee, CH
    Juang, BH
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 821 - 832
  • [36] Exploring data augmentation for Amazigh speech recognition with convolutional neural networks
    Hossam Boulal
    Farida Bouroumane
    Mohamed Hamidi
    Jamal Barkani
    Mustapha Abarkan
    International Journal of Speech Technology, 2025, 28 (1) : 53 - 65
  • [37] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [38] Data Augmentation for Improving Explainability of Hate Speech Detection
    Gunjan Ansari
    Parmeet Kaur
    Chandni Saxena
    Arabian Journal for Science and Engineering, 2024, 49 : 3609 - 3621
  • [39] Data Augmentation for Improving Explainability of Hate Speech Detection
    Ansari, Gunjan
    Kaur, Parmeet
    Saxena, Chandni
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3609 - 3621
  • [40] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
    Baek, Ji-Young
    Lee, Seok-Pil
    Tsihrintzis, George A.
    ELECTRONICS, 2023, 12 (18)