Automatic fast double KNN classification algorithm based on ACC and hierarchical clustering for big data

被引:15
|
作者
Li, Haiyun [1 ]
Li, Haifeng [1 ]
Wei, Kaibin [1 ]
机构
[1] Tianshui Normal Univ, Sch Elect Informat & Elect Engn, Tianshui 741001, Peoples R China
基金
中国国家自然科学基金;
关键词
automatic classification; automatically determining the cluster centers; big data; clustering; hierarchical clustering; k-nearest neighbors;
D O I
10.1002/dac.3488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In data mining, k-nearest neighbors (KNN) classifier is an efficient lazy learning yet simple widely renowned method, which has been widely used in many actual applications, successfully. Because of time and memory restrictions of KNN, when KNN is tested in large-scale datasets, the classification accuracy is very low. Therefore, we propose an automatic fast double KNN classification algorithm on the basis of automatically determining the cluster centers and hierarchical clustering. We introduce automatically determining the cluster centers into the KNN in training process. Namely, big data samples are divided into several parts depending on our clustering methods. Afterwards, the clusters nearest to testing samples are excavated as the new training samples in the testing process. Each of the new samples is then conducted with hierarchical clustering. In this way, computation and time complexity are greatly reduced. Finally, experiments results conducted on big data show that new KNN classification method can significantly raise the accuracy and efficiency of automatic classification than other state-of-the-art KNN classification algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
    Xie, Ting
    Liu, Ruihua
    Wei, Zhengyuan
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
  • [42] Hierarchical clustering algorithm for fast image retrieval
    Krishnamachari, S
    Abdel-Mottaleb, M
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VII, 1998, 3656 : 427 - 435
  • [43] Clustering Based - A FAST Algorithm on High Dimensional Data
    Kadam, Sonali P.
    Varsha, Naikwadi S.
    Kaveree, Belamkar S.
    Aruna, Andhare S.
    Mayuri, Mohite M.
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 485 - 489
  • [44] Combined kNN Classification and Hierarchical Similarity Hash for Fast Malware Detection
    Choi, Sunoh
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [45] A confidence-based hierarchical feature clustering algorithm for text classification
    Jiang, Jung-Yi
    Yin, Kai-Tai
    Lee, Shie-Jue
    2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS, 2007, : 161 - 164
  • [46] Imbalanced Data Classification Algorithm Based on Clustering and SVM
    Huang, Bo
    Zhu, Yimin
    Wang, Zhongzhen
    Fang, Zhijun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (02)
  • [47] A Hierarchical Clustering Algorithm based on Spectral Classification for Wireless Sensor Networks
    Jorio, Ali
    El Fkihi, Sanaa
    Elbhiri, Brahim
    Aboutajdine, Driss
    2014 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2014, : 861 - 866
  • [48] An SNN-DBSCAN Based Clustering Algorithm for Big Data
    Pandey, Sriniwas
    Samal, Mamata
    Mohanty, Sraban Kumar
    ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 127 - 137
  • [49] Hierarchical Clustering Algorithm for Binary Data Based on Cosine Similarity
    Gao, Xiaonan
    Wu, Sen
    2018 8TH INTERNATIONAL CONFERENCE ON LOGISTICS, INFORMATICS AND SERVICE SCIENCES (LISS), 2018,
  • [50] Inductive Model of Data Clustering based on the Agglomerative Hierarchical Algorithm
    Babichev, Sergii
    Taif, Mohamed Ali
    Lytvynenko, Volodymyr
    PROCEEDINGS OF THE 2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2016, : 19 - 22