The Utility of Context When Extracting Entities From Legal Documents

被引:4
作者
Donnelly, Jonathan [1 ]
Roegiest, Adam [1 ]
机构
[1] Kira Syst, Toronto, ON, Canada
来源
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT | 2020年
关键词
D O I
10.1145/3340531.3412746
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging. Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 28 条
  • [1] Akbik Alan, 2019, P ACL 19
  • [2] [Anonymous], 2018, P NAACL 18
  • [3] Cleverdon Cyril W, 1970, TECHNICAL REPORT
  • [4] Derczynski Leon, 2017, P 3 WORKSH NOIS US G, P140, DOI 10.18653/v1/W17-4418,eprint:https://aclanthology.org/W17-4418.pdf
  • [5] Devlin Jacob, 2019, P NAACL HLT 19
  • [6] Donnelly Jonathan, 2019, P ECIR 19
  • [7] Erik F. Tjong Kim, 2003, P NAACL HLT 03
  • [8] Finkel Jenny Rose, 2005, P ACL 05
  • [9] Hinton G., 2015, ARXIV
  • [10] Kiss Tibor, 2006, COMPUTATIONAL LINGUI, V32