Automated Question Answering for Improved Understanding of Compliance Requirements: A Multi-Document Study

被引:19
作者
Abualhaija, Sallam [1 ]
Arora, Chetan [1 ,2 ]
Sleimi, Amin [1 ]
Briand, Lionel C. [1 ,3 ]
机构
[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg
[2] Deakin Univ, Geelong, Vic, Australia
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada
来源
2022 30TH IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2022) | 2022年
基金
加拿大自然科学与工程研究理事会;
关键词
Requirements Engineering; Regulatory Compliance; Natural Language Processing (NLP); Question Answering; Language Models (LMs); BERT;
D O I
10.1109/RE54965.2022.00011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Software systems are increasingly subject to regulatory compliance. Extracting compliance requirements from regulations is challenging. Ideally, locating compliance-related information in a regulation requires a joint effort from requirements engineers and legal experts, whose availability is limited. However, regulations are typically long documents spanning hundreds of pages, containing legal jargon, applying complicated natural language structures, and including cross-references, thus making their analysis effort-intensive. In this paper, we propose an automated question-answering (QA) approach that assists requirements engineers in finding the legal text passages relevant to compliance requirements. Our approach utilizes large-scale language models fine-tuned for QA, including BERT and three variants. We evaluate our approach on 107 question-answer pairs, manually curated by subject-matter experts, for four different European regulatory documents. Among these documents is the general data protection regulation (GDPR) - a major source for privacy-related requirements. Our empirical results show that, in similar to 94% of the cases, our approach finds the text passage containing the answer to a given question among the top five passages that our approach marks as most relevant. Further, our approach successfully demarcates, in the selected passage, the right answer with an average accuracy of similar to 91%.
引用
收藏
页码:39 / 50
页数:12
相关论文
共 54 条
[21]  
Heie M.H., 2010, P ACL 2010 ACL, P236
[22]  
Hu MH, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2077
[23]  
Jurafsky Daniel, 2021, Speech and Language Processing, V3rd
[24]   Canary: An Interactive and Query-Based Approach to Extract Requirements from Online Forums [J].
Kanchev, Georgi M. ;
Murukannaiah, Pradeep K. ;
Chopra, Amit K. ;
Sawyer, Pete .
2017 IEEE 25TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE), 2017, :470-471
[25]   Relevance-guided Supervision for OpenQA with ColBERT [J].
Khattab, Omar ;
Potts, Christopher ;
Zaharia, Matei .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 :929-944
[26]  
Khazaeli S, 2021, P NATURAL LEGAL LAN, P107
[27]  
Kien P. M., 2020, P 28 INT C COMPUTATI
[28]  
Klaus Pohl e Chris Rupp., 2011, Requirements Engineering Fundamentals: A Study Guide for the Certified Professional for Requirements Engineering Exam - Foundation Level - IREB Compliant, V1st
[29]   Jupyter Notebooks-a publishing format for reproducible computational workflows [J].
Kluyver, Thomas ;
Ragan-Kelley, Benjamin ;
Perez, Fernando ;
Granger, Brian ;
Bussonnier, Matthias ;
Frederic, Jonathan ;
Kelley, Kyle ;
Hamrick, Jessica ;
Grout, Jason ;
Corlay, Sylvain ;
Ivanov, Paul ;
Avila, Damin ;
Abdalla, Safia ;
Willing, Carol .
POSITIONING AND POWER IN ACADEMIC PUBLISHING: PLAYERS, AGENTS AND AGENDAS, 2016, :87-90
[30]  
Lan ZZ, 2020, Arxiv, DOI [arXiv:1909.11942, DOI 10.48550/ARXIV.1909.11942]