Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

被引：0

作者：

Lin, Yist Y. ^{[1
]}

Han, Tao ^{[1
]}

Xu, Haihua ^{[1
]}

Van Tung Pham ^{[1
]}

Khassanov, Yerbolat ^{[1
]}

Chong, Tze Yuang ^{[1
]}

He, Yi ^{[1
]}

Lu, Lu ^{[1
]}

Ma, Zejun ^{[1
]}

机构：

[1] ByteDance, Beijing, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

random utterance concatenation; data augmentation; short video; end-to-end; speech recognition;

D O I：

10.21437/Interspeech.2023-1272

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observations that our human-transcribed training utterances tend to be much shorter for short-video spontaneous speech (similar to 3 seconds on average), while our test utterance generated from voice activity detection front-end is much longer (similar to 10 seconds on average). Such a mismatch can lead to suboptimal performance. Empirically, it's observed the proposed RUC method significantly improves long utterance recognition without performance drop on short one. Overall, it achieves 5.72% word error rate reduction on average for 15 languages and improved robustness to various utterance length.

引用

页码：904 / 908

页数：5

共 50 条

[31] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Jin, Zengrui
Geng, Mengzhe
Deng, Jiajun
Wang, Tianzi
Hu, Shujie
Li, Guinan
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429
[32] A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems
Manuel Ramirez, Jose
Montalvo, Ana
Ramon Calvo, Jose
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 669 - 678
[33] Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
Tao, Huawei
Shan, Shuai
Hu, Ziyi
Zhu, Chunhua
Ge, Hongyi
ENTROPY, 2023, 25 (01)
[34] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Kopparapu, Sunil Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
[35] Speech recognition and utterance verification based on a generalized confidence score
Koo, MW
Lee, CH
Juang, BH
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 821 - 832
[36] Exploring data augmentation for Amazigh speech recognition with convolutional neural networks
Hossam Boulal
Farida Bouroumane
Mohamed Hamidi
Jamal Barkani
Mustapha Abarkan
International Journal of Speech Technology, 2025, 28 (1) : 53 - 65
[37] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
Meng, Linghui
Xu, Jin
Tan, Xu
Wang, Jindong
Qin, Tao
Xu, Bo
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
[38] Data Augmentation for Improving Explainability of Hate Speech Detection
Gunjan Ansari
Parmeet Kaur
Chandni Saxena
Arabian Journal for Science and Engineering, 2024, 49 : 3609 - 3621
[39] Data Augmentation for Improving Explainability of Hate Speech Detection
Ansari, Gunjan
Kaur, Parmeet
Saxena, Chandni
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3609 - 3621
[40] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
Baek, Ji-Young
Lee, Seok-Pil
Tsihrintzis, George A.
ELECTRONICS, 2023, 12 (18)

← 1 2 3 4 5 →