A framework for distributed nearest neighbor classification using Hadoop

被引:1
|
作者
Ding Q. [1 ]
Boykin R. [2 ]
机构
[1] Department of Computer Science, East Carolina University, Greenville, 27858, NC
[2] Department of Computer Science, University of South Carolina, Columbia, 29208, SC
来源
Ding, Qin (dingq@ecu.edu) | 1600年 / IOS Press BV卷 / 17期
基金
美国国家科学基金会;
关键词
classification; Data mining; distributed data mining; Hadoop; K-Nearest Neighbor;
D O I
10.3233/JCM-160676
中图分类号
学科分类号
摘要
Within the field of data mining and machine learning, the K-Nearest Neighbor algorithm is a classic algorithm which simply yet elegantly classifies data based upon its similarity to other data. While it follows that the accuracy increases as more data are provided, handling large sets of data is difficult to process serially. It is therefore ideal to perform these tasks in parallel or distributed mode. In this paper, we proposed a framework for distributed nearest neighbor classification. A custom K-Nearest Neighbor algorithm was developed using Hadoop, an environment for developing and deploying applications in parallel on a cluster. The algorithm was implemented on a cluster then tested for accuracy and time of execution. It was observed that the accuracy depends on the provided k-value and on the data set, which is to be expected for the K-Nearest Neighbor process. The time of execution was found to increase logarithmically as the file size, and thus the amount of data the algorithm must parse, increases exponentially. © 2017 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:S11 / S19
页数:8
相关论文
共 50 条
  • [1] k-Nearest Neighbor Classification Using Dissimilarity Increments
    Aidos, Helena
    Fred, Ana
    IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 27 - 33
  • [2] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [3] Using gravitational search algorithm in prototype generation for nearest neighbor classification
    Rezaei, Mohadese
    Nezamabadi-pour, Hossein
    NEUROCOMPUTING, 2015, 157 : 256 - 263
  • [4] Microarray Data Classification using Fuzzy K-Nearest Neighbor
    Kumar, Mukesh
    Rath, Santanu Ku
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1032 - 1038
  • [5] Mapping of Nearest Neighbor for Classification
    Ishii, Naohiro
    Torii, Ippei
    Bao, Yongguang
    Tanaka, Hidekazu
    2013 IEEE/ACIS 12TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2013, : 121 - +
  • [6] Analysis of the k-nearest neighbor classification
    Li, Jing
    Cheng, Ming
    INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1911 - 1917
  • [7] Classification of Segmented Milkfish Eyes using Cosine K-Nearest Neighbor
    Prasetyo, Eko
    Adityo, R. Dimas
    Purbaningtyas, Rani
    2019 2ND INTERNATIONAL CONFERENCE ON APPLIED INFORMATION TECHNOLOGY AND INNOVATION (ICAITI2019), 2019, : 93 - 98
  • [8] Classification of Lower Back Pain Using K-Nearest Neighbor Algorithm
    Sandag, Green Arther
    Tedry, Natalia Elisabet
    Lolong, Steven
    2018 6TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT (CITSM), 2018, : 367 - 371
  • [9] Emotion Classification in Song Lyrics Using K-Nearest Neighbor Method
    Ferdinan, Afif Hijra
    Osmond, Andrew Brian
    Setianingsih, Casi
    2018 INTERNATIONAL CONFERENCE ON CONTROL, ELECTRONICS, RENEWABLE ENERGY AND COMMUNICATIONS (ICCEREC), 2018, : 63 - 69
  • [10] Improving nearest neighbor classification using Ensembles of Evolutionary Generated Prototype Subsets
    Verbiest, Nele
    Vluymans, Sarah
    Cornelis, Chris
    Garcia-Pedrajas, Nicolas
    Saeys, Yvan
    APPLIED SOFT COMPUTING, 2016, 44 : 75 - 88