Hybrid visual computing models to discover the clusters assessment of high dimensional big data

被引:5
作者
Basha, M. Suleman [1 ]
Mouleeswaran, S. K. [1 ]
Prasad, K. Rajendra [2 ]
机构
[1] Dayananda Sagar Univ, Dept CSE, Bangalore, Karnataka, India
[2] RGM Coll Engn & Technol, Dept CSE, Nandyal, India
关键词
Data clustering; Cluster tendency; Visual models; Big data; Subspace learning; ALGORITHMS;
D O I
10.1007/s00500-022-07092-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clusters assessment is a major identified problem in big data clustering. Top big data partitioning techniques, such as, spherical k-means, Mini-batch-k-means are widely used in many large data applications. However, they need prior information about the clusters assessment to discover the quality of clusters over the big data. Existing visual models, namely, clustering with improved visual assessment of tendency, and sample viewpoints cosine-based similarity VAT (SVPCS-VAT), efficiently perform the clusters assessment of big data. For the high-dimensional big data, the SVPCS-VAT is enhanced with the subspace learning techniques, principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projection (LPP), Neighborhood preserving embedding (NPE). These are used to develop hybrid visual computing models, including PCA-based SVPCS-VAT, LDA-based SVPCS-VAT, and LPP-based SVPCS-VAT, NPE-based SVPCS-VAT to overcome the curse of dimensionality problem. Experimental is conducted on benchmarked datasets to demonstrate and compare the efficiency with the state-of-the-art big data clustering methods.
引用
收藏
页码:4249 / 4262
页数:14
相关论文
共 37 条
[1]   Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].
Achlioptas, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687
[2]   Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods? [J].
Amelio, Alessia ;
Pizzuti, Clara .
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, :1584-1585
[3]  
Asuncion A., 2007, Uci machine learning repository
[4]   Sampling-based visual assessment computing techniques for an efficient social data clustering [J].
Basha, M. Suleman ;
Mouleeswaran, S. K. ;
Prasad, K. Rajendra .
JOURNAL OF SUPERCOMPUTING, 2021, 77 (08) :8013-8037
[5]   Towards a theoretical foundation for Laplacian-based manifold methods [J].
Belkin, Mikhail ;
Niyogi, Partha .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (08) :1289-1308
[6]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[7]   VAT: A tool for visual assessment of (cluster) tendency [J].
Bezdek, JC ;
Hathaway, RJ .
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :2225-2230
[8]   Comparative Performance Evaluation of Clustering Algorithms for Grouping Manufacturing Firms [J].
Bhatnagar, Vikas ;
Majhi, Ritanjali ;
Jena, Pradyot Ranjan .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (08) :4071-4083
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9