Predicting the objective and priority of issue reports in software repositories

被引:39
|
作者
Izadi, Maliheh [1 ]
Akbari, Kiana [1 ]
Heydarnoori, Abbas [1 ]
机构
[1] Sharif Univ Technol, Intelligent Software Engn Lab, Tehran, Iran
关键词
Software evolution and maintenance; Mining software repositories; Issue reports; Classification; Prioritization; Machine learning; Natural language processing; INTERRATER RELIABILITY; KAPPA; CODE; COEFFICIENT; USAGE;
D O I
10.1007/s10664-021-10085-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% (fine-tuned RoBERTa) and 75% (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of 85.3% and Randolph's free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.
引用
收藏
页数:37
相关论文
共 50 条
  • [31] Mining Software Repositories Using Topic Models
    Thomas, Stephen W.
    2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 1138 - 1139
  • [32] Software Engineering Repositories: Expanding the PROMISE Database
    Lima, Marcia
    Valle, Victor
    Costa, Estevao
    Lira, Fylype
    Gadelha, Bruno
    PROCEEDINGS OF THE XXXIII BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING, SBES 2019, 2019, : 427 - 436
  • [33] MetricMiner: Supporting Researchers in Mining Software Repositories
    Sokol, Francisco Zigmund
    Aniche, Mauricio Finavaro
    Gerosa, Marco Aurelio
    2013 IEEE 13TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2013, : 142 - 146
  • [34] A survey and taxonomy of approaches for mining software repositories in the context of software evolution
    Kagdi, Huzefa
    Collard, Michael L.
    Maletic, Jonathan I.
    JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2007, 19 (02): : 77 - 131
  • [35] Mining expertise of developers from software repositories
    Hammad, Maen
    Hijazi, Haneen
    Hammad, Mustafa
    Otoom, Ahmed Fawzi
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2020, 62 (03) : 227 - 239
  • [36] Research on mining software repositories to facilitate refactoring
    Nyamawe, Ally S.
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (05)
  • [37] Manas: Mining Software Repositories to Assist AutoML
    Nguyen, Giang
    Islam, Md Johirul
    Pan, Rangeet
    Rajan, Hridesh
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1368 - 1380
  • [38] Mining Software Repositories to Identify Library Experts
    Santos, Adriano
    Souza, Mauricio
    Oliveira, Johnatan
    Figueiredo, Eduardo
    XII BRAZILIAN SYMPOSIUM ON SOFTWARE COMPONENTS, ARCHITECTURES, AND REUSE (SBCARS), 2018, : 83 - 91
  • [40] Automated classification of software issue reports using machine learning techniques: an empirical study
    Pandey N.
    Sanyal D.K.
    Hudait A.
    Sen A.
    Innovations in Systems and Software Engineering, 2017, 13 (4) : 279 - 297