A Machine Learning-Based Approach for Demarcating Requirements in Textual Specifications

被引：23

作者：

Abualhaija, Sallam ^{[1
]}

Arora, Chetan ^{[1
]}

Sabetzadeh, Mehrdad ^{[1
]}

Briand, Lionel C. ^{[1
,2
]}

Vaz, Eduardo ^{[3
]}

机构：

[1] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, Luxembourg, Luxembourg

[2] Univ Ottawa, Sch Engn & Comp Sci, Ottawa, ON, Canada

[3] QRA Corp, Halifax, NS, Canada

来源：

2019 27TH IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2019) | 2019年

基金：

欧洲研究理事会;

关键词：

Textual Requirements; Requirements Identification and Classification; Machine Learning; Natural Language Processing;

D O I：

10.1109/RE.2019.00017

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 30 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall.

引用

页码：51 / 62

页数：12

共 53 条

[1]

[Anonymous], 2010, 2010 ACM IEEE 32 INT

[2]

[Anonymous], 2011, 291482011 ISOIECIEEE

[3]

[Anonymous], 2011, JWNL JAVA WORDNET LI

[4]

[Anonymous], 2016, DEEP LEARNING

[5]

[Anonymous], IBM RAT DOORS

[6]

[Anonymous], JAV WORD DOC MAN API

[7]

[Anonymous], 1995, 3 WORKSH VER LARG CO

[8]

[Anonymous], 2018, WEKA MULTISEARCH PAR

[9] An Active Learning Approach for Improving the Accuracy of Automated Domain Model Extraction [J].

Arora, Chetan ;

Sabetzadeh, Mehrdad ;

Nejati, Shiva ;

Briand, Lionel .

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2019, 28 (01)

[10] Automated Extraction and Clustering of Requirements Glossary Terms [J].

Arora, Chetan ;

Sabetzadeh, Mehrdad ;

Briand, Lionel ;

Zimmer, Frank .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2017, 43 (10) :918-945

← 1 2 3 4 5 6 →