A framework for distributed nearest neighbor classification using Hadoop

被引:1
作者
Ding Q. [1 ]
Boykin R. [2 ]
机构
[1] Department of Computer Science, East Carolina University, Greenville, 27858, NC
[2] Department of Computer Science, University of South Carolina, Columbia, 29208, SC
来源
Ding, Qin (dingq@ecu.edu) | 1600年 / IOS Press BV卷 / 17期
基金
美国国家科学基金会;
关键词
classification; Data mining; distributed data mining; Hadoop; K-Nearest Neighbor;
D O I
10.3233/JCM-160676
中图分类号
学科分类号
摘要
Within the field of data mining and machine learning, the K-Nearest Neighbor algorithm is a classic algorithm which simply yet elegantly classifies data based upon its similarity to other data. While it follows that the accuracy increases as more data are provided, handling large sets of data is difficult to process serially. It is therefore ideal to perform these tasks in parallel or distributed mode. In this paper, we proposed a framework for distributed nearest neighbor classification. A custom K-Nearest Neighbor algorithm was developed using Hadoop, an environment for developing and deploying applications in parallel on a cluster. The algorithm was implemented on a cluster then tested for accuracy and time of execution. It was observed that the accuracy depends on the provided k-value and on the data set, which is to be expected for the K-Nearest Neighbor process. The time of execution was found to increase logarithmically as the file size, and thus the amount of data the algorithm must parse, increases exponentially. © 2017 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:S11 / S19
页数:8
相关论文
共 50 条
  • [21] The k-Nearest Neighbor Algorithm Using MapReduce Paradigm
    Anchalia, Prajesh P.
    Roy, Kaushik
    PROCEEDINGS FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2014, : 513 - 518
  • [22] Fast Nearest Neighbor classification using class-based clustering
    Chen, Tung-Shou
    Chiu, Yung-Hsing
    Lin, Chih-Chiang
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1894 - +
  • [23] CLASSIFICATION OF MULTIPLE OBSERVATIONS USING A RANK NEAREST-NEIGHBOR RULE
    BAGUI, SC
    PATTERN RECOGNITION LETTERS, 1993, 14 (08) : 611 - 617
  • [24] Stepwise dynamic nearest neighbor (SDNN): a new algorithm for classification
    Karabas, Deniz
    Birant, Derya
    Taser, Pelin Yildirim
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (05) : 751 - 770
  • [25] K-Nearest Neighbor Classification for Glass Identification Problem
    Aldayel, Mashael S.
    2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,
  • [26] A novel version of k nearest neighbor: Dependent nearest neighbor
    Ertugrul, Omer Faruk
    Tagluk, Mehmet Emin
    APPLIED SOFT COMPUTING, 2017, 55 : 480 - 490
  • [27] Enhancing data classification using locally informed weighted k-nearest neighbor algorithm
    Abdalla, Hassan, I
    Amer, Ali A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 276
  • [28] Nonlinear Mapping of Reducts - Nearest Neighbor Classification
    Ishii, Naohiro
    Torii, Ippei
    Mukai, Naoto
    KazunoriIwata
    Nakashima, Toyoshiro
    3RD INTERNATIONAL CONFERENCE ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY (ACIT 2015) 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND INTELLIGENCE (CSI 2015), 2015, : 416 - 421
  • [29] Classification of Power Quality Disturbances Using Wavelet Transform and K-Nearest Neighbor Classifier
    Ngo Minh Khoa
    Dinh Thanh Viet
    Nguyen Huu Hieu
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2013,
  • [30] Adaptive quasiconformal kernel nearest neighbor classification
    Peng, J
    Heisterkamp, DR
    Dai, HK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (05) : 656 - 661