Predicting the objective and priority of issue reports in software repositories

被引:39
|
作者
Izadi, Maliheh [1 ]
Akbari, Kiana [1 ]
Heydarnoori, Abbas [1 ]
机构
[1] Sharif Univ Technol, Intelligent Software Engn Lab, Tehran, Iran
关键词
Software evolution and maintenance; Mining software repositories; Issue reports; Classification; Prioritization; Machine learning; Natural language processing; INTERRATER RELIABILITY; KAPPA; CODE; COEFFICIENT; USAGE;
D O I
10.1007/s10664-021-10085-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% (fine-tuned RoBERTa) and 75% (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of 85.3% and Randolph's free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.
引用
收藏
页数:37
相关论文
共 50 条
  • [41] Mining software repositories for adaptive change commits using machine learning techniques
    Megdadi, Omar
    Alhindawi, Nouh
    Alsakran, Jamal
    Saifan, Ahmad
    Migdadi, Hatim
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 109 : 80 - 91
  • [42] Leveraging Models to Reduce Test Cases in Software Repositories
    Gharachorlu, Golnaz
    Sumner, Nick
    2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 230 - 241
  • [43] Nine best practices for research software registries and repositories
    Garijo, Daniel
    Menager, Herve
    Hwang, Lorraine
    Trisovic, Ana
    Hucka, Michael
    Morrell, Thomas
    Allen, Alice
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [44] SamikshaUmbra: Contribution and Performance Assessment of Software Maintenance Professionals by Mining Software Repositories
    Rastogi, Ayushi
    Sureka, Ashish
    2013 20TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2013), VOL 2, 2013, : 170 - 175
  • [45] Changeset-Based Topic Modeling of Software Repositories
    Corley, Christopher S.
    Damevski, Kostadin
    Kraft, Nicholas A.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (10) : 1068 - 1080
  • [46] Evolutionary Optimization of Software Quality Modeling with Multiple Repositories
    Liu, Yi
    Khoshgoftaar, Taghi M.
    Seliya, Naeem
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2010, 36 (06) : 852 - 864
  • [47] Quick remedy commits and their impact on mining software repositories
    Wen, Fengcai
    Nagy, Csaba
    Lanza, Michele
    Bavota, Gabriele
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (01)
  • [48] Mining Software Repositories for the Characterization of Continuous Integration and Delivery
    Destro, Gabriel Augusto
    Nicolau de Franca, Breno Bernard
    34TH BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING, SBES 2020, 2020, : 664 - 669
  • [49] HealthyEnv: a tool to assist in health assessment of software repositories
    Winter, Diego
    Avelino, Guilherme
    Miranda, Charles
    36TH BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING, SBES 2022, 2022, : 382 - 387
  • [50] Quick remedy commits and their impact on mining software repositories
    Fengcai Wen
    Csaba Nagy
    Michele Lanza
    Gabriele Bavota
    Empirical Software Engineering, 2022, 27