Automatic fast double KNN classification algorithm based on ACC and hierarchical clustering for big data

被引:15
|
作者
Li, Haiyun [1 ]
Li, Haifeng [1 ]
Wei, Kaibin [1 ]
机构
[1] Tianshui Normal Univ, Sch Elect Informat & Elect Engn, Tianshui 741001, Peoples R China
基金
中国国家自然科学基金;
关键词
automatic classification; automatically determining the cluster centers; big data; clustering; hierarchical clustering; k-nearest neighbors;
D O I
10.1002/dac.3488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In data mining, k-nearest neighbors (KNN) classifier is an efficient lazy learning yet simple widely renowned method, which has been widely used in many actual applications, successfully. Because of time and memory restrictions of KNN, when KNN is tested in large-scale datasets, the classification accuracy is very low. Therefore, we propose an automatic fast double KNN classification algorithm on the basis of automatically determining the cluster centers and hierarchical clustering. We introduce automatically determining the cluster centers into the KNN in training process. Namely, big data samples are divided into several parts depending on our clustering methods. Afterwards, the clusters nearest to testing samples are excavated as the new training samples in the testing process. Each of the new samples is then conducted with hierarchical clustering. In this way, computation and time complexity are greatly reduced. Finally, experiments results conducted on big data show that new KNN classification method can significantly raise the accuracy and efficiency of automatic classification than other state-of-the-art KNN classification algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A fast classification algorithm for big data based on KNN
    Niu, Kun
    Zhao, Fang
    Zhang, Shubo
    Journal of Applied Sciences, 2013, 13 (12) : 2208 - 2212
  • [2] Research on the high robustness data classification and the mining algorithm based on hierarchical clustering and KNN
    Li, Haohang
    Wang, Shen
    Tang, Rui
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 1049 - 1054
  • [3] Application Research of KNN Algorithm Based on Clustering in Big Data Talent Demand Information Classification
    Xiao, Qingtao
    Zhong, Xin
    Zhong, Chenghua
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (06)
  • [4] Medical Health Big Data Classification Based on KNN Classification Algorithm
    Xing, Wenchao
    Bei, Yilin
    IEEE ACCESS, 2020, 8 (28808-28819) : 28808 - 28819
  • [5] Efficient kNN classification algorithm for big data
    Deng, Zhenyun
    Zhu, Xiaoshu
    Cheng, Debo
    Zong, Ming
    Zhang, Shichao
    NEUROCOMPUTING, 2016, 195 : 143 - 148
  • [6] A Fast Projection-Based Algorithm for Clustering Big Data
    Yun Wu
    Zhiquan He
    Hao Lin
    Yufei Zheng
    Jingfen Zhang
    Dong Xu
    Interdisciplinary Sciences: Computational Life Sciences, 2019, 11 : 360 - 366
  • [7] A Fast Projection-Based Algorithm for Clustering Big Data
    Wu, Yun
    He, Zhiquan
    Lin, Hao
    Zheng, Yufei
    Zhang, Jingfen
    Xu, Dong
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2019, 11 (03) : 360 - 366
  • [8] An Improved KNN Text Classification Algorithm Based on Clustering
    Zhou Yong
    Li Youwen
    Xia Shixiong
    JOURNAL OF COMPUTERS, 2009, 4 (03) : 230 - 237
  • [9] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [10] A fast document classification algorithm based on improved KNN
    Guo, Ge
    Ping, Xijian
    Chen, Gang
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 3, PROCEEDINGS, 2006, : 186 - +