MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation Extraction

被引：6

作者：

Li, Rui ^{[1
]}

Yang, Cheng ^{[1
]}

Li, Tingwei ^{[1
]}

Su, Sen ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software Engn Sch, 10 Xi Tucheng Rd, Beijing 100876, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2022年 / 40卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Natural language processing; NLP; knowledge distillation; distant supervision; neural network; multi-instance learning; label softening;

D O I：

10.1145/3503917

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Relation extraction (RE), an important information extraction task, faced the great challenge brought by limited annotation data. To this end, distant supervision was proposed to automatically label RE data, and thus largely increased the number of annotated instances. Unfortunately, lots of noise relation annotations brought by automatic labeling become a new obstacle. Some recent studies have shown that the teacher-student framework of knowledge distillation can alleviate the interference of noise relation annotations via label softening. Nevertheless, we find that they still suffer from two problems: propagation of inaccurate dark knowledge and constraint of a unified distillation temperature. In this article, we propose a simple and effective Multi-instance Dynamic Temperature Distillation (MiDTD) framework, which is model-agnostic and mainly involves two modules: multi-instance target fusion (MiTF) and dynamic temperature regulation (DTR). MiTF combines the teacher's predictions for multiple sentences with the same entity pair to amend the inaccurate dark knowledge in each student's target. DTR allocates alterable distillation temperatures to different training instances to enable the softness of most student's targets to be regulated to a moderate range. In experiments, we construct three concrete MiDTD instantiations with BERT, PCNN, and BiLSTM-based RE models, and the distilled students significantly outperform their teachers and the state-of-the-art (SOTA) methods.

引用

页数：32

共 81 条

[1] Alt C, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P1388
[2] Ando A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4964, DOI 10.1109/ICASSP.2018.8461299
[3] Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
[4] Bari M. Saiful, 2020, ARXIV 2004
[5] Blessing Andre, 2012, CROSSLINGUAL DISTANT, P1123, DOI [10.1145/2396761.2398411, DOI 10.1145/2396761.2398411]
[6] Bordes A., 2013, P 26 INT C NEUR INF, V2, P2787
[7] Nested Relation Extraction with Iterative Neural Network
Cao, Yixuan
Chen, Dian
Li, Hongwei
Luo, Ping
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1001 - 1010
[8] Choi M, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P1047, DOI 10.1145/2348283.2348462
[9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10] Relation Extraction via Domain-aware Transfer Learning
Di, Shimin
Shen, Yanyan
Chen, Lei
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1348 - 1357

← 1 2 3 4 5 6 7 8 9 →