A framework for distributed nearest neighbor classification using Hadoop

被引:1
|
作者
Ding Q. [1 ]
Boykin R. [2 ]
机构
[1] Department of Computer Science, East Carolina University, Greenville, 27858, NC
[2] Department of Computer Science, University of South Carolina, Columbia, 29208, SC
来源
Ding, Qin (dingq@ecu.edu) | 1600年 / IOS Press BV卷 / 17期
基金
美国国家科学基金会;
关键词
classification; Data mining; distributed data mining; Hadoop; K-Nearest Neighbor;
D O I
10.3233/JCM-160676
中图分类号
学科分类号
摘要
Within the field of data mining and machine learning, the K-Nearest Neighbor algorithm is a classic algorithm which simply yet elegantly classifies data based upon its similarity to other data. While it follows that the accuracy increases as more data are provided, handling large sets of data is difficult to process serially. It is therefore ideal to perform these tasks in parallel or distributed mode. In this paper, we proposed a framework for distributed nearest neighbor classification. A custom K-Nearest Neighbor algorithm was developed using Hadoop, an environment for developing and deploying applications in parallel on a cluster. The algorithm was implemented on a cluster then tested for accuracy and time of execution. It was observed that the accuracy depends on the provided k-value and on the data set, which is to be expected for the K-Nearest Neighbor process. The time of execution was found to increase logarithmically as the file size, and thus the amount of data the algorithm must parse, increases exponentially. © 2017 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:S11 / S19
页数:8
相关论文
共 50 条
  • [11] International Journal Quartile Classification Using the K-Nearest Neighbor Method
    Wibawa, Aji Prasetya
    Kurniawan, Ahmad Chandra
    Rosyid, Harits Ar
    Salah, Ali M. Mohammad
    2019 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND INFORMATION ENGINEERING (ICEEIE), 2019, : 336 - 341
  • [12] Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor
    Kumar, Mukesh
    Rath, Nitish Kumar
    Swain, Amitav
    Rath, Santanu Kumar
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 301 - 310
  • [13] Using k-nearest-neighbor classification in the leaves of a tree
    Buttrey, SE
    Karo, C
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 40 (01) : 27 - 37
  • [14] Random projection ensemble adaptive nearest neighbor classification
    Kang, Jongkyeong
    Jhun, Myoungshic
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 401 - 410
  • [15] Discriminant adaptive nearest neighbor classification
    Hastie, T
    Tibshirani, R
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (06) : 607 - 616
  • [16] Differentially private nearest neighbor classification
    Mehmet Emre Gursoy
    Ali Inan
    Mehmet Ercan Nergiz
    Yucel Saygin
    Data Mining and Knowledge Discovery, 2017, 31 : 1544 - 1575
  • [17] An adaptive nearest neighbor algorithm for classification
    Wang, JG
    Neskovic, P
    Cooper, LN
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3069 - 3074
  • [18] Differentially private nearest neighbor classification
    Gursoy, Mehmet Emre
    Inan, Ali
    Nergiz, Mehmet Ercan
    Saygin, Yucel
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (05) : 1544 - 1575
  • [19] MIPr - a Framework for Distributed Image Processing Using Hadoop
    Sozykin, Andrey
    Epanchintsev, Timofei
    2015 9TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2015, : 35 - 39
  • [20] In silico classification of solubility using binary k-nearest neighbor and physicochemical descriptors
    Fredsted, Berith
    Brockhoff, Per B.
    Vind, Christian
    Padkjaer, Soren B.
    Refsgaard, Hanne H. F.
    QSAR & COMBINATORIAL SCIENCE, 2007, 26 (04): : 452 - 459