Software reuse analytics using integrated random forest and gradient boosting machine learning algorithm

被引:27
作者
Sandhu, Amandeep Kaur [1 ]
Batth, Ranbir Singh [1 ]
机构
[1] Lovely Profess Univ, Sch Comp Sci & Engn, Phagwara, Punjab, India
关键词
AdaBoostM1; confusion matrix; DecisionStump; gradient boosting machine; J48; JRip; LMT; LogitBoost; one R; part; random forest; software metrics; software reuse; DATA MINING TECHNIQUES; MANAGEMENT;
D O I
10.1002/spe.2921
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The term Cleaner Production (CP) for Production Companies is contemplated as influential to get sustainable production. CP mainly deals with three R's that is, reuse, reduce, and recycle. For software enterprise, the software reuse plays a pivotal role. Software reuse is a process of producing new products or software from the existing software by updating it. To extract useful information from the existing software data mining comes into light. The algorithms used for software reuse face issues related to maintenance cost, accuracy, and performance. Also, the currently used algorithm does not give accurate results on whether the component of software can be reused. Machine Learning gives the best results to predicate if the given software component is reusable or not. This paper introduces an integrated Random Forest and Gradient Boosting Machine Learning Algorithm (RFGBM) which test the reusability of the given software code considering the object-oriented parameters such as cohesion, coupling, cyclomatic complexity, bugs, number of children, and depth inheritance tree. Further, the proposed algorithm is compared with J48, AdaBoostM1, LogitBoost, Part, One R, LMT, JRip, DecisionStump algorithms. Performance metrices like accuracy, error rate, Relative Absolute Error, and Mean Absolute Error are improved using RFGBM. This algorithm also utilizes data preprocessing with the help of an unsupervised filter to remove the missing value for efficiency improvement. Proposed algorithm outperforms existing in term of performance parameters.
引用
收藏
页码:735 / 747
页数:13
相关论文
共 32 条
[11]  
Landwehr N, 2003, THESIS U FREIBURG
[12]   Quality tools applied to Cleaner Production programs: a first approach toward a new methodology [J].
Lopes Silva, Diogo Aparecido ;
Delai, Ivete ;
Soares de Castro, Marco Aurelio ;
Ometto, Aldo Roberto .
JOURNAL OF CLEANER PRODUCTION, 2013, 47 :174-187
[13]   Barriers to the adoption of proactive environmental strategies [J].
Murillo-Luna, Josefina L. ;
Garces-Ayerbe, Concepcion ;
Rivera-Torres, Pilar .
JOURNAL OF CLEANER PRODUCTION, 2011, 19 (13) :1417-1425
[14]   Application of data mining techniques in customer relationship management: A literature review and classification [J].
Ngai, E. W. T. ;
Xiu, Li ;
Chau, D. C. K. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :2592-2602
[15]   Software reusability metrics estimation: Algorithms, models and optimization techniques [J].
Padhy, Neelamdhab ;
Singh, R. P. ;
Satapathy, Suresh Chandra .
COMPUTERS & ELECTRICAL ENGINEERING, 2018, 69 :653-668
[16]   Application of Data Mining Techniques for Software Reuse Process [J].
Prakash, B. V. Ajay ;
Ashoka, D. V. ;
Aradhya, V. N. Manjunath .
2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT-2012), 2012, 4 :384-389
[17]   Solving Big Data Challenges for Enterprise Application Performance Management [J].
Rabl, Tilmann ;
Sadoghi, Mohammad ;
Jacobsen, Hans-Arno ;
Gomez-Villamor, Sergio ;
Muntes-Mulero, Victor ;
Mankovskii, Serge .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12) :1724-1735
[18]  
Rajakumari K.E., 2019, P 2019 3 IEEE INT C, P1, DOI [10.1109/ICECCT.2019.8869324, DOI 10.1109/ICECCT.2019.8869324]
[19]  
Ratzinger Jacek, 2008, P 2008 INT LEIPZ GER, P35, DOI [DOI 10.1145/1370750.1370759, 10.1145/1370750.1370759]
[20]  
Rodriguez G, 2016, INTED PROC, P4335