MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

被引:0
作者
Wang, Steven H. [1 ]
Scardigli, Antoine [1 ]
Tang, Leonard [2 ]
Chen, Wei [3 ]
Levkin, Dimitry [3 ]
Chen, Anya [4 ]
Ball, Spencer [5 ]
Woodside, Thomas [6 ]
Zhang, Oliver [7 ]
Hendrycks, Dan [8 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Harvard Univ, Cambridge, MA USA
[3] Atticus Project, Dorchester, England
[4] Nueva Sch, San Mateo, CA USA
[5] Univ Wisconsin, Madison, WI USA
[6] Yale Univ, New Haven, CT USA
[7] Stanford Univ, Stanford, CA USA
[8] Univ Calif Berkeley, Berkeley, CA USA
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
引用
收藏
页码:16369 / 16382
页数:14
相关论文
共 24 条
[1]  
Brown Tom, 2020, Language models are few-shot learners, V33, P1877, DOI DOI 10.48550/ARXIV.2005.14165
[2]  
Chalkidis I, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020
[3]  
Chalkidis I, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P254
[4]  
Chalkidis Ilias, 2017, P 16 EDITION INT C
[5]  
Dao T, 2022, ADV NEUR IN
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Duan XY, 2019, Arxiv, DOI arXiv:1912.09156
[8]  
He Pengcheng, 2020, arXiv, DOI DOI 10.48550/ARXIV.2006.03654
[9]  
Hendrycks D., 2021, NeurIPS
[10]  
Hendrycks Dan, 2021, ICLR