DocToModel: Automated Authoring of Models from Diverse Requirements Specification Documents

被引:3
作者
Rajbhoj, Asha [1 ]
Nistala, Padmalata [2 ]
Kulkarni, Vinay [1 ]
Soni, Shivani [1 ]
Pathan, Ajim [1 ]
机构
[1] TCS Res, 54B Ind Estate, Pune, Maharashtra, India
[2] TCS Res, 1 Software Units Layout, Hyderabad, Telangana, India
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP | 2023年
关键词
Meta-Model; Automated Model Authoring; Model Extraction; Document Parser; NLP; Meta-Model Pattern; Pattern Interpreter; INFORMATION EXTRACTION;
D O I
10.1109/ICSE-SEIP58684.2023.00024
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Early stages of Software Development Life Cycle (SDLC) namely requirement elicitation and requirements analysis have remained document-centric in the industry for marketdriven, complex, large-scale business applications and products. The documentation typically runs into hundreds of Natural Language (NL) text documents which requirements engineers need to sift looking for the relevant information and also maintain these documents in-sync over time - a time-consuming and error-prone activity. Much of this difficulty can be overcome if the information is available in a structured form that is amenable to automated processing. Purposive models offer a way out. However, for easy adoption by industry practitioners, these models must be populated from NL text documents in a largely automated manner. This task is characterized by high variability with several documents containing different information conforming to different structures and styles. As a result, purposive information extractors need to be developed for each project/ product. Moreover, being an open-ended space there is no upper bound on the information extractors that need to be developed. To overcome this difficulty, we propose a document structure agnostic and meta-model agnostic tool, DocToModel, for the automated authoring of models from NL text documents. It provides a pattern mapping language to specify a mapping of structured and unstructured document information to meta-model elements, and a pattern interpreter to automate model authoring. The configurable and extensible architecture of DocToModel makes it generic and amenable to easy repurposing for other NL documents. This paper, describes the approach and illustrates its utility and efficacy on multiple real-world case studies.
引用
收藏
页码:196 / 207
页数:12
相关论文
共 29 条
[1]  
Acher M., 2012, 6 INT WORKSH VAR MOD, P45, DOI DOI 10.1145/2110147.2110153
[2]  
[Anonymous], 2022, ANTLR OP SOURC
[3]  
[Anonymous], MOD OBJ FAC
[4]  
[Anonymous], 2022, DOCX4J OP SOURC ASLV
[5]  
[Anonymous], 2022, OpenNLP
[6]   Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review [J].
Bakar, Noor Hasrina ;
Kasirun, Zarinah M. ;
Salleh, Norsaremah .
JOURNAL OF SYSTEMS AND SOFTWARE, 2015, 106 :132-149
[7]   Natural Language Processing for Requirements Engineering The Best Is Yet to Come [J].
Dalpiaz, Fabiano ;
Ferrari, Alessio ;
Franch, Xavier ;
Palomares, Cristina .
IEEE SOFTWARE, 2018, 35 (05) :115-119
[8]  
Davril Jean-Marc., 2013, Proc. of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE13), P290, DOI [DOI 10.1145/2491411.2491455, 10.1145/2491411.2491455]
[9]   An Automated Tool for Generating UML Models from Natural Language Requirements [J].
Deeptimahanti, Deva Kumar ;
Babar, Muhammad Ali .
2009 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :680-682
[10]   Automatic Transformation of User Stories into UML Use Case Diagrams using NLP Techniques [J].
Elallaoui, Meryem ;
Nafil, Khalid ;
Touahni, Raja .
9TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2018) / THE 8TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2018) / AFFILIATED WORKSHOPS, 2018, 130 :42-49