FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code

被引:0
|
作者
Nemania Borovits
Indika Kumara
Dario Di Nucci
Parvathy Krishnan
Stefano Dalla Palma
Fabio Palomba
Damian A. Tamburri
Willem-Jan van den Heuvel
机构
[1] Tilburg University,Jheronimus Academy of Data Science
[2] University of Salerno,Jheronimus Academy of Data Science
[3] Technical University Eindhoven,undefined
来源
Empirical Software Engineering | 2022年 / 27卷
关键词
Infrastructure as code; Linguistic anti-patterns; Word embedding; Machine learning; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their short text names. To this end, we propose FindICI a novel automated approach that employs word embedding and classification algorithms. We build and use the abstract syntax tree of IaC code units to create code embeddings used by machine learning techniques to detect inconsistent IaC code units. We evaluated our approach with two experiments on Ansible tasks systematically extracted from open source repositories for various word embedding models and classification algorithms. Classical machine learning models and novel deep learning models with different word embedding methods showed comparable and satisfactory results in detecting inconsistent Ansible tasks related to the top-10 used Ansible modules.
引用
收藏
相关论文
共 9 条
  • [1] FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code
    Borovits, Nemania
    Kumara, Indika
    Di Nucci, Dario
    Krishnan, Parvathy
    Dalla Palma, Stefano
    Palomba, Fabio
    Tamburri, Damian A.
    van den Heuvel, Willem-Jan
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (07)
  • [2] Repairing Infrastructure-as-Code using Large Language Models
    Low, En
    Cheh, Carmen
    Chen, Binbin
    2024 IEEE SECURE DEVELOPMENT CONFERENCE, SECDEV 2024, 2024, : 20 - 27
  • [3] Analysis of Machine Code Using Natural Language Processing
    Khurpia, Naman
    2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, SMART AND GREEN TECHNOLOGIES (ICISSGT 2021), 2021, : 183 - 187
  • [4] Enhancing Code Review Efficiency - Automated Pull Request Evaluation Using Natural Language Processing and Machine Learning
    Zydron, Przemyslaw Wincenty
    Protasiewicz, Jaroslaw
    ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2023, 17 (04) : 162 - 167
  • [5] Using Natural Language Processing and Machine Learning to Detect Online Grooming Attacks
    Street, Jake
    Olajide, Funminiyi
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2022, 2024, 1454 : 261 - 270
  • [6] Machine Translation from Natural Language to Code Using Long-Short Term Memory
    Rahit, K. M. Tahsin Hassan
    Nabil, Rashidul Hasan
    Huq, Md Hasibul
    PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 56 - 63
  • [7] Using Machine Learning and Natural Language Processing for Unveiling Similarities between Microbial Data
    Brezocnik, Lucija
    Zlender, Tanja
    Rupnik, Maja
    Podgorelec, Vili
    MATHEMATICS, 2024, 12 (17)
  • [8] Exploiting linguistic information from Nepali transcripts for early detection of Alzheimer's disease using natural language processing and machine learning techniques
    Adhikari, Surabhi
    Thapa, Surendrabikram
    Naseem, Usman
    Singh, Priyanka
    Huo, Huan
    Bharathy, Gnana
    Prasad, Mukesh
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2022, 160
  • [9] Behavioral Pattern Analysis between Bilingual and Monolingual Listeners' Natural Speech Perception on Foreign-Accented English Language Using Different Machine Learning Approaches
    Ahad, Md Tanvir
    Ahsan, Md Manjurul
    Jahan, Ishrat
    Nazim, Redwan
    Yazdan, Munshi Md Shafwat
    Huebner, Pedro
    Siddique, Zahed
    TECHNOLOGIES, 2021, 9 (03)