Automatic fast double KNN classification algorithm based on ACC and hierarchical clustering for big data

被引:15
|
作者
Li, Haiyun [1 ]
Li, Haifeng [1 ]
Wei, Kaibin [1 ]
机构
[1] Tianshui Normal Univ, Sch Elect Informat & Elect Engn, Tianshui 741001, Peoples R China
基金
中国国家自然科学基金;
关键词
automatic classification; automatically determining the cluster centers; big data; clustering; hierarchical clustering; k-nearest neighbors;
D O I
10.1002/dac.3488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In data mining, k-nearest neighbors (KNN) classifier is an efficient lazy learning yet simple widely renowned method, which has been widely used in many actual applications, successfully. Because of time and memory restrictions of KNN, when KNN is tested in large-scale datasets, the classification accuracy is very low. Therefore, we propose an automatic fast double KNN classification algorithm on the basis of automatically determining the cluster centers and hierarchical clustering. We introduce automatically determining the cluster centers into the KNN in training process. Namely, big data samples are divided into several parts depending on our clustering methods. Afterwards, the clusters nearest to testing samples are excavated as the new training samples in the testing process. Each of the new samples is then conducted with hierarchical clustering. In this way, computation and time complexity are greatly reduced. Finally, experiments results conducted on big data show that new KNN classification method can significantly raise the accuracy and efficiency of automatic classification than other state-of-the-art KNN classification algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] An Automatic Data Clustering Algorithm based on Differential Evolution
    Tsai, Chun-Wei
    Tai, Chiech-An
    Chiang, Ming-Chao
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 794 - 799
  • [32] An automatic clustering algorithm based on data contained ratio
    1600, Northwestern Polytechnical University (34):
  • [33] A Fast Clustering-based Recommender System for Big Data
    Hong-Quan Do
    Nguyen, T. H-An
    Quoc-Anh Nguyen
    Trung-Hieu Nguyen
    Viet-Vu Vu
    Cuong Le
    2022 24TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ARITIFLCIAL INTELLIGENCE TECHNOLOGIES TOWARD CYBERSECURITY, 2022, : 353 - +
  • [34] A "big-data" algorithm for KNN-PLS
    Metz, Maxime
    Lesnoff, Matthieu
    Abdelghafour, Florent
    Akbarinia, Reza
    Masseglia, Florent
    Roger, Jean-Michel
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 203
  • [35] Adaptive clustering algorithm based on kNN and density
    Shi, Bing
    Han, Lixin
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2018, 104 : 37 - 44
  • [36] Clustering ensemble based on the fuzzy KNN algorithm
    Weng, Fangfei
    Jiang, Qingshan
    Chen, Lifei
    Hong, Zhiling
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 1001 - +
  • [37] A new clustering algorithm based on KNN and DENCLUE
    Yu, XG
    Jian, Y
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2033 - 2038
  • [38] A fast algorithm for hierarchical text classification
    Chuang, WT
    Tiyyagura, A
    Yang, J
    Giuffrida, G
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2000, 1874 : 409 - 418
  • [39] A hierarchical clustering method based on the threshold of semantic feature in big data
    School of Information Science and Engineering, Central South University, Changsha
    410083, China
    不详
    425006, China
    Dianzi Yu Xinxi Xuebao, 12 (2795-2801):
  • [40] A Fast Clustering Algorithm for Analyzing Big Data Generated in Ubiquitous Sensor Networks
    Zahwe, Oussama
    Majed, Ola
    Harb, Hassan
    Hamze, Mohamad
    Nasser, Abbass
    2018 19TH INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2018, : 142 - 147