Using semantic roles to improve text classification in the requirements domain

被引：0

作者：

Alejandro Rago

Claudia Marcos

J. Andres Diaz-Pace

机构：

[1] UNICEN University,ISISTAN Research Institute

[2] CONICET,undefined

[3] CIC,undefined

来源：

Language Resources and Evaluation | 2018年 / 52卷

关键词：

Text classification; Natural language processing; Knowledge representation; Semantic enrichment; Use case specification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Engineering activities often produce considerable documentation as a by-product of the development process. Due to their complexity, technical analysts can benefit from text processing techniques able to identify concepts of interest and analyze deficiencies of the documents in an automated fashion. In practice, text sentences from the documentation are usually transformed to a vector space model, which is suitable for traditional machine learning classifiers. However, such transformations suffer from problems of synonyms and ambiguity that cause classification mistakes. For alleviating these problems, there has been a growing interest in the semantic enrichment of text. Unfortunately, using general-purpose thesaurus and encyclopedias to enrich technical documents belonging to a given domain (e.g. requirements engineering) often introduces noise and does not improve classification. In this work, we aim at boosting text classification by exploiting information about semantic roles. We have explored this approach when building a multi-label classifier for identifying special concepts, called domain actions, in textual software requirements. After evaluating various combinations of semantic roles and text classification algorithms, we found that this kind of semantically-enriched data leads to improvements of up to 18% in both precision and recall, when compared to non-enriched data. Our enrichment strategy based on semantic roles also allowed classifiers to reach acceptable accuracy levels with small training sets. Moreover, semantic roles outperformed Wikipedia- and WordNET-based enrichments, which failed to boost requirements classification with several techniques. These results drove the development of two requirements tools, which we successfully applied in the processing of textual use cases.

引用

页码：801 / 837

页数：36

共 71 条

[1]

Badawi D(2014)A novel framework for termset selection and weighting in binary text classification Engineering Applications of Artificial Intelligence 35 38-53

[2]

Altincay H(2006)Boosting for text classification with semantic features Advances in Web Mining and Web Usage Analysis 3932 149-166

[3]

Bloehdorn S(2012)Functional grouping of natural language requirements for assistance in architectural software design Knowledge-Based Systems 30 78-86

[4]

Hotho A(2017)Software requirements as an application domain for natural language processing Language Resources and Evaluation 29 8:1-8:34

[5]

Casamayor A(2011)Concept-based information retrieval using explicit semantic analysis ACM Transactions on Information Systems (TOIS) 39 18-44

[6]

Godoy D(2013)Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques IEEE Transactions on Software Engineering 123 190-213

[7]

Campo M(2017)Rapid quality assurance with requirements smells Journal of Systems and Software 63 1593-1608

[8]

Diamantopoulos T(2012)Learning a concept-based document similarity measure Journal of the American Society for Information Science and Technology 58 110-122

[9]

Roth M(2015)Automated events identification in use cases Information and Software Technology 43 35-43

[10]

Symeonidis A(2015)Multi-class classification via heterogeneous ensemble of one-class classifiers Engineering Applications of Artificial Intelligence 21 227-247

← 1 2 3 4 5 6 7 8 →