AN EMPIRICAL STUDY OF TRANSFORMER-BASED NEURAL LANGUAGE MODEL ADAPTATION

被引：0

作者：

Li, Ke ^{[1
,2
]}

Liu, Zhe ^{[1
]}

He, Tianxing ^{[3
]}

Huang, Hongzhao ^{[1
]}

Peng, Fuchun ^{[1
]}

Povey, Daniel

Khudanpur, Sanjeev ^{[2
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[3] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

neural language model; language model adaptation; Transformer; linear interpolation; automatic speech recognition;

D O I：

10.1109/icassp40776.2020.9053399

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small target domains via finetuning. The second approach is a mixer of dynamically weighted models that are separately trained on source and target domains, aiming to improve simple linear interpolation with dynamic weighting. We compare the two approaches with three baselines - without adaptation, merging data, and simple interpolation - on Switchboard (SWBD) and Wall Street Journal (WSJ). Experiments show that the mixer model generally performs better than baselines and finetuning. Compared with no adaptation, finetuning and the mixer approach obtain up to relative 11.5% and 14.1% WER reductions on SWBD, respectively. The mixer model also outperforms linear interpolation and merging data. On WSJ, the mixer approach achieves a new state-of-the-art WER result.

引用

页码：7934 / 7938

页数：5

共 50 条

[21] A Transformer-Based Math Language Model for Handwritten Math Expression Recognition
Huy Quang Ung
Cuong Tuan Nguyen
Hung Tuan Nguyen
Thanh-Nghia Truong
Nakagawa, Masaki
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 403 - 415
[22] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques
Siddiq, Mohammed Latif
Majumder, Shafayat H.
Mim, Maisha R.
Jajodia, Sourov
Santos, Joanna C. S.
2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 71 - 82
[23] A novel transformer-based neural network model for tool wear estimation
Liu, Hui
Liu, Zhenyu
Jia, Weiqiang
Lin, Xianke
Zhang, Shuo
MEASUREMENT SCIENCE AND TECHNOLOGY, 2020, 31 (06)
[24] Transformer-Based Music Language Modelling and Transcription
Zonios, Christos
Pavlopoulos, John
Likas, Aristidis
PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
[25] Transformer-based Natural Language Understanding and Generation
Zhang, Feng
An, Gaoyun
Ruan, Qiuqi
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
[26] A Transformer-based Neural Architecture Search Method
Wang, Shang
Tang, Huanrong
Ouyang, Jianquan
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 691 - 694
[27] A transformer-based neural ODE for dense prediction
Seyedalireza Khoshsirat
Chandra Kambhamettu
Machine Vision and Applications, 2023, 34
[28] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
Oh, Jong-Hoon
Iida, Ryu
Kloetzer, Julien
Torisawa, Kentaro
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115
[29] A transformer-based neural ODE for dense prediction
Khoshsirat, Seyedalireza
Kambhamettu, Chandra
MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
[30] Privacy Protection in Transformer-based Neural Network
Lang, Jiaqi
Li, Linjing
Chen, Weiyun
Zeng, Daniel
2019 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2019, : 182 - 184

← 1 2 3 4 5 →