A Clustering Hybrid Algorithm for Smart Datasets using Machine Learning

被引:0
作者
Amin, Dar Masroof [1 ]
Rai, Munishwar [1 ]
机构
[1] Deemed Univ, MMICT & BM Maharishi Markandeshwar, Mullana 133203, Haryana, India
基金
英国科研创新办公室;
关键词
Random Forests (RF); Jaccard Similarity ([!text type='JS']JS[!/text]); triangle; smart data; root mean square error; mean absolute error; machine learning; RANDOM FOREST; THINGS IOT; INTERNET;
D O I
10.14569/IJACSA.2020.0110919
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the field of data science, Machine Learning is treated as sub-field which primarily deals with designing of algorithms which have ability to learn from previous information and make future predictions accordingly. In traditional computational world the Machine Learning was generally performed on highly performance servers and machines. The implementation of these concepts on Big Data analytics algorithms has high potential and is still in its early stages. So far as machine learning is concerned, performance measure is an important parameter to evaluate the overall functionality of the algorithms. The data set is a different entity and the measuring of performance on a data which is unseen is also called as test set, and training set is a Data set which is training itself. The Data Mining is extensively using learning algorithms for data analysis and to formulate future predications based on archived data. The research presented provides a step forward to make smart data sets out of training data set by evaluating machine learning algorithm. The research presented a novel hybrid algorithm that attempts to incorporate the feature of similarities in Random Forest machine learning algorithm for improving the classification accuracy and efficiency of working.
引用
收藏
页码:165 / 172
页数:8
相关论文
共 31 条
[1]  
ALSHARIF MH, 2020, SYMMETRY-BASEL, V12, DOI DOI 10.3390/sym12010088
[2]  
Anguita D., 2013, JUCS
[3]  
[Anonymous], 2015, GLOBAL TRENDS FUTURE
[4]  
BMWi, 2015, TECH REP
[5]   An effective and efficient approach to classification with incomplete data [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter ;
Xue, Bing ;
Lam Thu Bui .
KNOWLEDGE-BASED SYSTEMS, 2018, 154 :1-16
[6]  
Denil M., 2014, P 31 INT C MACH LEAR, V32
[7]   AnaData: A Novel Approach for Data Analytics Using Random Forest Tree and SVM [J].
Devi, Bali ;
Kumar, Sarvesh ;
Anuradha ;
Shankar, Venkatesh Gauri .
COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 :511-521
[8]  
Evans D, 2011, The Internet of Things-How the Next Evolution of the Internet is Changing Everything
[9]  
Gaber MM, 2009, STUD COMPUT INTELL, V206, P47
[10]   From Big to Smart Data: Iterative ensemble filter for noise filtering in Big Data classification [J].
Garcia-Gil, Diego ;
Luque-Sanchez, Francisco ;
Luengo, Julian ;
Garcia, Salvador ;
Herrera, Francisco .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2019, 34 (12) :3260-3274