An Empirical Comparison of Supervised Machine Learning Algorithms For Internet of Things Data

被引:0
作者
Khadse, Vijay [1 ]
Mahalle, Parikshit N. [1 ]
Biraris, Swapnil V. [2 ]
机构
[1] SKN Coll Engn Pune, Dept Comp Engn, Pune, Maharashtra, India
[2] Coll Engn Pune, Dept Comp Engn & IT, Pune, Maharashtra, India
来源
2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA) | 2018年
关键词
Internet of Things; Machine Learning; Kappa; Confusion Matrix; Cross-Validation; Precision; Recall; F1-score; Class Imbalance; ACCURACY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Internet of Things(IoT) is one of the rapidly growing fields andn has a wide range of applications such as smart cities, smart homes, connected wearable, connected health-care, and connected automobiles, etc. These IoT applications generate tremendous amounts of data which needs to be analyzed to draw useful inferences required to optimize the performance of IoT applications. The artificial intelligence(AI) and machine learning (ML) play the significant role in building the smart IoT systems. The main objective of the paper is a comprehensive analysis of five well-known supervised machine learning algorithms on IoT datasets. The five classifiers are K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Logistic Regression (LR). The feature reduction is performed using PCA algorithm. The performance of these five classifiers has been compared on the basis of six characteristics of IoT dataset such as size, number of features, number of classes, class imbalance, missing values and execution time. The classifiers have also been compared on various performance metrics such as precision, recall, f1-score, kappa, and accuracy. As per our results, the DT classifier gives the best accuracy of 99% among all the algorithms for all datasets. The results also show the performance of RF and KNN as almost similar and the NB and LR perform the worst among all the classifiers
引用
收藏
页数:6
相关论文
共 38 条
[11]  
Chettri Roshna, 2015, INT J COMPUTER APPL, V130
[12]   A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis [J].
Fahad, Adil ;
Alshatri, Najlaa ;
Tari, Zahir ;
Alamri, Abdullah ;
Khalil, Ibrahim ;
Zomaya, Albert Y. ;
Foufou, Sebti ;
Bouras, Abdelaziz .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) :267-279
[13]  
Freedman DavidAmiel., 2009, Statistical models: Theory and practice, P128
[14]  
Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601
[15]  
Hotelling H, 1936, BIOMETRIKA, V28, P321, DOI 10.2307/2333955
[16]   Analysis of a complex of statistical variables into principal components [J].
Hotelling, H .
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1933, 24 :417-441
[17]  
John G. H., 1995, ESTIMATING CONTINUOU
[18]   A framework for sensitivity analysis of decision trees [J].
Kaminski, Bogumil ;
Jakubczyk, Michal ;
Szufel, Przemyslaw .
CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, 2018, 26 (01) :135-159
[19]  
L'Heureux Alexandra, MACHINE LEARNING BIG, DOI DOI 10.1109/ACCESS.2017.2696365
[20]   Objects Communication Behavior on Multihomed Hybrid Ad Hoc Networks [J].
Leal, Bernardo ;
Atzori, Luigi .
INTERNET OF THINGS-BOOK, 2010, :3-11