Virtual prompt pre-training for prototype-based few-shot relation extraction

被引:44
作者
He, Kai [1 ,2 ]
Huang, Yucheng [1 ,2 ]
Mao, Rui [3 ]
Gong, Tieliang [1 ,2 ]
Li, Chen [1 ,2 ]
Cambria, Erik [3 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Shanxi Prov Key Lab Satellite & Terr Network Techn, Xian, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave,Block N4 02a, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Few-shot learning; Information extraction; Prompt tuning; Pre-trained Language Model; SENTIMENT ANALYSIS; LANGUAGE MODELS;
D O I
10.1016/j.eswa.2022.118927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning with pre-trained language models (PLM) has exhibited outstanding performance by reducing the gap between pre-training tasks and various downstream applications, which requires additional labor efforts in label word mappings and prompt template engineering. However, in a label intensive research domain, e.g., few-shot relation extraction (RE), manually defining label word mappings is particularly challenging, because the number of utilized relation label classes with complex relation names can be extremely large. Besides, the manual prompt development in natural language is subjective to individuals. To tackle these issues, we propose a virtual prompt pre-training method, projecting the virtual prompt to latent space, then fusing with PLM parameters. The pre-training is entity-relation-aware for RE, including the tasks of mask entity prediction, entity typing, distant supervised RE, and contrastive prompt pre-training. The proposed pre-training method can provide robust initialization for prompt encoding, while maintaining the interaction with the PLM. Furthermore, the virtual prompt can effectively avoid the labor efforts and the subjectivity issue in label word mapping and prompt template engineering. Our proposed prompt-based prototype network delivers a novel learning paradigm to model entities and relations via the probability distribution and Euclidean distance of the predictions of query instances and prototypes. The results indicate that our model yields an averaged accuracy gain of 4.21% on two few-shot datasets over strong RE baselines. Based on our proposed framework, our pre-trained model outperforms the strongest RE-related PLM by 6.52%.
引用
收藏
页数:11
相关论文
共 49 条
  • [1] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]
  • [2] Balntas V., 2016, BMVC, DOI 10.5244/C.30.119
  • [3] Bao H., 2021, Natural Language Processing and Chinese Computing, P235
  • [4] Cambria E., 2022, P 13 LANGUAGE RESOUR, P3829
  • [5] Canese Kathi., 2013, PubMed: the bibliographic database"
  • [6] Chen X, 2023, Arxiv, DOI arXiv:2104.07650
  • [7] Dong Bowen, 2020, P INT C COMP LING, P1594
  • [8] Peters ME, 2018, Arxiv, DOI [arXiv:1802.05365, 10.48550/arXiv.1802.05365, DOI 10.18653/V1.N18-1202]
  • [9] Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816
  • [10] Gao TY, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P6250