Towards Understanding Contracts Grammar: A Large Language Model-based Extractive Question-Answering Approach

被引：0

作者：

Rejithkumar, Gokul ^{[1
]}

Anish, Preethu Rose ^{[1
]}

Ghaisas, Smita ^{[1
]}

机构：

[1] TCS Res, Pune, India

来源：

32ND IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE, RE 2024 | 2024年

关键词：

text extraction; deep learning; natural language processing; large language models; question-answering; token classification; text-to-text generation; prompting; empirical research;

D O I：

10.1109/RE59067.2024.00037

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Software Engineering (SE) contracts play a pivotal role in Information Technology Outsourcing (ITO) projects. The obligations in SE contracts are known to be a useful source for deriving software requirements, thereby contributing to the overall Software Development Life Cycle (SDLC). Making sense of contractual obligations is an important first step in successfully executing software projects. This includes building compliant systems, meeting delivery deadlines, avoiding heavy penalties, and steering clear of expensive litigations. In this work, we present an approach to capture the essence of a contractual clause by extracting its Contracts Grammar. Through an exploratory study, we first identify the constituents of Contracts Grammar. Subsequently, we experiment with multiple approaches for the automated extraction of these constituents, including extractive question-answering, token classification, text-to-text generation, prompting, and regular expressions. The question-answering based approach performed the best in terms of high average ROUGE-L score of 0.81, and faster inference times. The work presented in this paper is a part of the Contracts Governance System (CGS) and is in the process of deployment within a large IT vendor organization.

引用

页码：310 / 320

页数：11

共 50 条

[31] Developing and Pre-Processing a Dataset using a Rhetorical Relation to Build a Question-Answering System based on an Unsupervised Learning Approach
Dutta, Ashit Kumar
Sait, Abdul Rahaman Wahab
Keshta, Ismail Mohamed
Elhalles, Abheer
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (11): : 199 - 206
[32] OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
Maharjan, Jenish
Garikipati, Anurag
Singh, Navan Preet
Cyrus, Leo
Sharma, Mayank
Ciobanu, Madalina
Barnes, Gina
Thapa, Rahul
Mao, Qingqing
Das, Ritankar
SCIENTIFIC REPORTS, 2024, 14 (01):
[33] OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
Jenish Maharjan
Anurag Garikipati
Navan Preet Singh
Leo Cyrus
Mayank Sharma
Madalina Ciobanu
Gina Barnes
Rahul Thapa
Qingqing Mao
Ritankar Das
Scientific Reports, 14 (1)
[34] Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration
Zhao, Ziwei
Zhang, Weiyi
Chen, Xiaolan
Song, Fan
Gunasegaram, James
Huang, Wenyong
Shi, Danli
He, Mingguang
Liu, Na
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[35] Text Matching in Insurance Question-Answering Community Based on an Integrated BiLSTM-TextCNN Model Fusing Multi-Feature
Li, Zhaohui
Yang, Xueru
Zhou, Luli
Jia, Hongyu
Li, Wenli
ENTROPY, 2023, 25 (04)
[36] Large language model-based evolutionary optimizer: Reasoning with elitism
Brahmachary, Shuvayan
Joshi, Subodh M.
Panda, Aniruddha
Koneripalli, Kaushik
Sagotra, Arun Kumar
Patel, Harshil
Sharma, Ankush
Jagtap, Ameya D.
Kalyanaraman, Kaushic
NEUROCOMPUTING, 2025, 622
[37] Question Answering based Clinical Text Structuring Using Pre-trained Language Model
Qiu, Jiahui
Zhou, Yangming
Ma, Zhiyuan
Ruan, Tong
Liu, Jinlin
Sun, Jing
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1596 - 1600
[38] KFEX-N : A table-text data question-answering model based on knowledge-fusion encoder and EX-N tree decoder
Tao, Ye
Liu, Jiawang
Li, Hui
Cao, Wenqian
Qin, Xiugong
Tian, Yunlong
Du, Yongjie
NEUROCOMPUTING, 2024, 593
[39] Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting
Harris, Nicholas
Butani, Anand
Hashmy, Syed
ADVANCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2024, 4 (02): : 2358 - 2368
[40] Intelligent question answering for water conservancy project inspection driven by knowledge graph and large language model collaboration
Yang, Yangrui
Chen, Sisi
Zhu, Yaping
Liu, Xuemei
Pan, Shifeng
Wang, Xin
LHB-HYDROSCIENCE JOURNAL, 2024, 110 (01)

← 1 2 3 4 5 →