Agent-DA: Enhancing low-resource event extraction with collaborative multi-agent data augmentation

被引：0

作者：

Tian, Xuemeng ^{[1
]}

Guo, Yikai ^{[2
]}

Ge, Bin ^{[1
]}

Yuan, Xiaoguang ^{[3
]}

Zhang, Hang ^{[2
]}

Yang, Yuting ^{[2
,4
]}

Ke, Wenjun ^{[4
,5
]}

Li, Guozheng ^{[4
]}

机构：

[1] Natl Univ Def Technol, Lab Big Data & Decis, Changsha 410073, Peoples R China

[2] Beijing Inst Comp Technol & Applicat, Beijing 100039, Peoples R China

[3] Natl Univ Def Technol, Natl Key Lab Informat Syst Engn, Changsha 410073, Peoples R China

[4] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Peoples R China

[5] Southeast Univ, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 211189, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Event extraction; Data augmentation; Multi-agent;

D O I：

10.1016/j.knosys.2024.112625

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Low-resource event extraction presents a significant challenge in real-world applications, particularly in domains like pharmaceuticals, military and law, where data is frequently insufficient. Data augmentation, as a direct method for expanding samples, is considered an effective solution. However, existing data augmentation methods often suffer from text fluency issues and label hallucination. To address these challenges, we propose a framework called Agent-DA, which leverages multi-agent collaboration for event extraction data augmentation. Specifically, Agent-DA follows a three-step process: data generation by the large language model, collaborative filtering by both the large language model and small language model to discriminate easy samples, and the use of an adjudicator to identify hard samples. Through iterative and selective augmentation, our method significantly enhances both the quantity and quality of event samples, improving text fluency and label consistency. Extensive experiments on the ACE2005-EN and ACE2005-EN+ datasets demonstrate the effectiveness of Agent-DA, with F1-score improvements ranging from 0.15% to 16.18% in trigger classification and from 2.2% to 15.67% in argument classification.

引用

页数：15

共 58 条

[1] Anaby-Tavor A, 2020, AAAI CONF ARTIF INTE, V34, P7383
[2] A Survey on Data Augmentation for Text Classification
Bayer, Markus
Kaufhold, Marc-Andre
Reuter, Christian
[J]. ACM COMPUTING SURVEYS, 2023, 55 (07)
[3] Cao H., 2022, P 29 INT C COMPUTATI, P1953
[4] Chen RR, 2024, AAAI CONF ARTIF INTE, P17772
[5] Chen Weize, 2024, 12 INT C LEARN REPR
[6] Dai HX, 2023, Arxiv, DOI arXiv:2302.13007
[7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8] Feng SY, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, P968
[9] Feng Yukun, 2024, arXiv
[10] Gao Jun, 2022, FINDINGS ASS COMPUTA, P4537

← 1 2 3 4 5 6 →