Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks

被引：3

作者：

Duan, Wenying ^{[1
]}

He, Xiaoxi ^{[2
]}

Zhou, Zimu ^{[3
]}

Rao, Hong ^{[1
]}

Thiele, Lothar ^{[2
]}

机构：

[1] Nanchang Univ, Nanchang, Jiangxi, Peoples R China

[2] Swiss Fed Inst Technol, Zurich, Switzerland

[3] Singapore Management Univ, Singapore, Singapore

来源：

INTERSPEECH 2021 | 2021年

基金：

瑞士国家科学基金会;

关键词：

descriptive meta-information; hypernetworks; pre-trained language model;

D O I：

10.21437/Interspeech.2021-229

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Pre-trained language models have been widely adopted as backbones in various natural language processing tasks. However, existing pre-trained language models ignore the descriptive meta-information in the text such as the distinction between the title and the mainbody, leading to over-weighted attention to insignificant text. In this paper, we propose a hypernetwork-based architecture to model the descriptive meta-information and integrate it into pre-trained language models. Evaluations on three natural language processing tasks show that our method notably improves the performance of pre-trained language models and achieves the state-of-the-art results on keyphrase extraction.

引用

页码：3216 / 3220

页数：5

共 28 条

[1]

Augenstein I., 2017, SEMEVAL ACL, DOI [10.18653/v1/S17-2091, 10.18653/v1/S17- 2091]

[2]

Chen W, 2019, AAAI CONF ARTIF INTE, P6268

[3]

Chuang Y.-S., 2020, ARXIV191011559

[4] What does BERT look at? An Analysis of BERT's Attention [J].

Clark, Kevin ;

Khandelwal, Urvashi ;

Levy, Omer ;

Manning, Christopher D. .

BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, :276-286

[5]

Cui Y., 2020, ARXIV200413922, DOI 10.18653/v1/2020.findings-emnlp.58

[6]

Del Corso Gianna M., 2005, P 14 INT C WORLD WID, P97, DOI DOI 10.1145/1060745.1060764

[7]

Devlin J., 2019, CoRR, V1, P4171

[8]

Ha D., 2016, INT C LEARNING REPRE

[9]

Han Yizeng, 2021, arXiv

[10]

Hulth A, 2003, PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P216

← 1 2 3 →