Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

被引：13

作者：

Mi, Chenggang ^{[1
]}

Xie, Lei ^{[2
]}

Zhang, Yanning ^{[2
]}

机构：

[1] Xian Int Studies Univ, Foreign Language & Literature Inst, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci, Natl Engn Lab Integrated AeroSp Ground Ocean Big, Xian, Peoples R China

来源：

NEURAL NETWORKS | 2022年 / 148卷

基金：

中国国家自然科学基金;

关键词：

Data augmentation; Speech translation; Paraphrasing; FRAMEWORK;

D O I：

10.1016/j.neunet.2022.01.016

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech-word pair co-occurrence was proposed to select the highest scoring source speech-target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase. org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio-text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio-text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.

引用

页码：194 / 205

页数：12

共 53 条

[1] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[2] Agarap Abien Fred, 2018, CoRR
[3] [Anonymous], 2005, P 43 ANN M ASS COMP
[4] Bahar P., 2019, ABS191108876 ARXIV
[5] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[6] What Is a Paraphrase?
Bhagat, Rahul
Hovy, Eduard
[J]. COMPUTATIONAL LINGUISTICS, 2013, 39 (03) : 463 - 472
[7] An empirical study of smoothing techniques for language modeling
Chen, SF
Goodman, J
[J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) : 359 - 394
[8] Chung YA, 2019, INT CONF ACOUST SPEE, P7170, DOI 10.1109/ICASSP.2019.8683550
[9] Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Chung, Yu-An
Glass, James
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 811 - 815
[10] Conneau A., 2017, P 2017 C EMPIRICAL M, P670, DOI [10.18653/v1/D17-1070, DOI 10.18653/V1/D17-1070]

← 1 2 3 4 5 6 →