A framework for distributed nearest neighbor classification using Hadoop

被引:1
作者
Ding Q. [1 ]
Boykin R. [2 ]
机构
[1] Department of Computer Science, East Carolina University, Greenville, 27858, NC
[2] Department of Computer Science, University of South Carolina, Columbia, 29208, SC
基金
美国国家科学基金会;
关键词
classification; Data mining; distributed data mining; Hadoop; K-Nearest Neighbor;
D O I
10.3233/JCM-160676
中图分类号
学科分类号
摘要
Within the field of data mining and machine learning, the K-Nearest Neighbor algorithm is a classic algorithm which simply yet elegantly classifies data based upon its similarity to other data. While it follows that the accuracy increases as more data are provided, handling large sets of data is difficult to process serially. It is therefore ideal to perform these tasks in parallel or distributed mode. In this paper, we proposed a framework for distributed nearest neighbor classification. A custom K-Nearest Neighbor algorithm was developed using Hadoop, an environment for developing and deploying applications in parallel on a cluster. The algorithm was implemented on a cluster then tested for accuracy and time of execution. It was observed that the accuracy depends on the provided k-value and on the data set, which is to be expected for the K-Nearest Neighbor process. The time of execution was found to increase logarithmically as the file size, and thus the amount of data the algorithm must parse, increases exponentially. © 2017 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:S11 / S19
页数:8
相关论文
共 50 条
[41]   Skin lesion classification system using a K-nearest neighbor algorithm [J].
Hatem, Mustafa Qays .
VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2022, 5 (01)
[42]   Classification of Heart Disease Using K-Nearest Neighbor and Genetic Algorithm [J].
Jabbar, M. Akhil ;
Deekshatulu, B. L. ;
Chandra, Priti .
FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE: MODELING TECHNIQUES AND APPLICATIONS (CIMTA) 2013, 2013, 10 :85-94
[43]   Identification of glucose levels in urine based on classification using k-nearest neighbor algorithm method [J].
Yudhana, Anton ;
Warsino, Fathiyyah ;
Akbar, Son Ali ;
Nuraisyah, Fatma ;
Mufandi, Ilham .
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2023, 16 (01)
[44]   ARSkNN: An efficient k-nearest neighbor classification technique using mass based similarity measure [J].
Kumar, Ashish ;
Bhatnagar, Roheet ;
Srivastava, Sumit .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (02) :1633-1644
[45]   DETERMINATION OF THE BEST VEHICLE PATHWAY WITH CLASSIFICATION OF DATA MINING TWITTER USING K-NEAREST NEIGHBOR [J].
Satvika, Gd. Aditya Jana ;
Nasution, Surya Michrandi ;
Nugrahaeni, Ratna Astuti .
2018 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2018, :72-76
[46]   Distributed Content Based Image Search Engine using Hadoop Framework [J].
Uttarwar, Dhananjay ;
Agarwal, Aakash ;
Kadiwar, Riyaz ;
Katkar, Vijay D. .
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, :1706-1710
[47]   Improving nearest neighbor classification with cam weighted distance [J].
Zhou, CY ;
Chen, YQ .
PATTERN RECOGNITION, 2006, 39 (04) :635-645
[48]   Deep Metric Learning for K Nearest Neighbor Classification [J].
Liao, Tingting ;
Lei, Zhen ;
Zhu, Tianqing ;
Zeng, Shan ;
Li, Yaqin ;
Yuan, Cao .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) :264-275
[49]   Fuzzy-rough nearest neighbor algorithms in classification [J].
Sarkar, Manish .
FUZZY SETS AND SYSTEMS, 2007, 158 (19) :2134-2152
[50]   Nearest Neighbor-based Instance Selection for Classification [J].
Yu, Guanghua ;
Tian, Jin ;
Li, Minqiang .
2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, :75-80