Automatic fast double KNN classification algorithm based on ACC and hierarchical clustering for big data

被引:15
|
作者
Li, Haiyun [1 ]
Li, Haifeng [1 ]
Wei, Kaibin [1 ]
机构
[1] Tianshui Normal Univ, Sch Elect Informat & Elect Engn, Tianshui 741001, Peoples R China
基金
中国国家自然科学基金;
关键词
automatic classification; automatically determining the cluster centers; big data; clustering; hierarchical clustering; k-nearest neighbors;
D O I
10.1002/dac.3488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In data mining, k-nearest neighbors (KNN) classifier is an efficient lazy learning yet simple widely renowned method, which has been widely used in many actual applications, successfully. Because of time and memory restrictions of KNN, when KNN is tested in large-scale datasets, the classification accuracy is very low. Therefore, we propose an automatic fast double KNN classification algorithm on the basis of automatically determining the cluster centers and hierarchical clustering. We introduce automatically determining the cluster centers into the KNN in training process. Namely, big data samples are divided into several parts depending on our clustering methods. Afterwards, the clusters nearest to testing samples are excavated as the new training samples in the testing process. Each of the new samples is then conducted with hierarchical clustering. In this way, computation and time complexity are greatly reduced. Finally, experiments results conducted on big data show that new KNN classification method can significantly raise the accuracy and efficiency of automatic classification than other state-of-the-art KNN classification algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Fast Hierarchical Clustering Based on Compressed Data and OPTICS
    Breunig, Markus M.
    Kriegel, Hans-Peter
    Sander, Joerg
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 232 - 242
  • [22] A Fast Distributed Accelerated Gradient Algorithm for Big Data Classification
    Wu, Changsheng
    Wang, Huihui
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 749 - 756
  • [23] A Fast Multiclass Classification Algorithm Based on Cooperative Clustering
    Chuanhuan Yin
    Xiang Zhao
    Shaomin Mu
    Shengfeng Tian
    Neural Processing Letters, 2013, 38 : 389 - 402
  • [24] A Fast Multiclass Classification Algorithm Based on Cooperative Clustering
    Yin, Chuanhuan
    Zhao, Xiang
    Mu, Shaomin
    Tian, Shengfeng
    NEURAL PROCESSING LETTERS, 2013, 38 (03) : 389 - 402
  • [25] Text clustering based on kernel KNN clustering algorithm
    Xiong, Hao
    Sun, Sheng
    Feng, Yunfang
    International Journal of Applied Mathematics and Statistics, 2013, 46 (16): : 69 - 75
  • [26] Fast clustering algorithm of commodity association big data sparse network
    Pan, Hailan
    Yang, Xiaohuan
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2021, 12 (04) : 667 - 674
  • [27] Fast clustering algorithm of commodity association big data sparse network
    Hailan Pan
    Xiaohuan Yang
    International Journal of System Assurance Engineering and Management, 2021, 12 : 667 - 674
  • [28] A FAST ALGORITHM FOR AUTOMATIC CLASSIFICATION
    DATTOLA, RT
    JOURNAL OF LIBRARY AUTOMATION, 1969, 2 (01): : 31 - &
  • [29] Fast Recommendation Method of Personalized Tourism Big Data Information Based on Improved Clustering Algorithm
    Feng, Yi-lin
    Zhang, He-qing
    Peng, Cai-ting
    ADVANCED HYBRID INFORMATION PROCESSING, PT II, 2022, 417 : 284 - 296
  • [30] A Study on Automatic Sleep Stage Classification Based on Clustering Algorithm
    Shao, Xuexiao
    Hu, Bin
    Zheng, Xiangwei
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 139 - 148