Sampling-based visual assessment computing techniques for an efficient social data clustering

被引:11
作者
Basha, M. Suleman [1 ]
Mouleeswaran, S. K. [1 ]
Prasad, K. Rajendra [2 ]
机构
[1] Dayananda Sagar Univ, Dept Comp Sci & Engn, Bangalore, Karnataka, India
[2] Rajeev Gandhi Mem Coll Engn & Technol, Dept Comp Sci & Engn, Nandyal, India
关键词
Cluster tendency; Social data clustering; Scalability; Visual methods; Feature extraction; ALGORITHMS; FRAMEWORK;
D O I
10.1007/s11227-021-03618-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Visual methods were used for pre-cluster assessment and useful cluster partitions. Existing visual methods, such as visual assessment tendency (VAT), spectral VAT (SpecVAT), cosine-based VAT (cVAT), and multi-viewpoints cosine-based similarity VAT (MVS-VAT), effectively assess the knowledge about the number of clusters or cluster tendency. Tweets data partitioning is underlying the problem of social data clustering. Cosine-based visual methods succeeded widely in text data clustering. Thus, cVAT and MVS-VAT are the best suited methods for the derivation of social data clusters. However, MVS-VAT is facing the problem of scalability issues in terms of computational time and memory allocation. Therefore, this paper presents the sampling-based MVS-VAT computing technique to overcome the scalability problem in social data clustering to select sample inter-cluster viewpoints. Standard health keywords and benchmarked TREC2017 and TREC2018 health keywords are taken to extract health tweets in the experiment for illustrating the performance comparison between existing and proposed visual methods.
引用
收藏
页码:8013 / 8037
页数:25
相关论文
共 29 条
[1]   Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods? [J].
Amelio, Alessia ;
Pizzuti, Clara .
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, :1584-1585
[2]   RIFT: A Rule Induction Framework for Twitter Sentiment Analysis [J].
Asghar, Muhammad Zubair ;
Khan, Aurangzeb ;
Khan, Furqan ;
Kundi, Fazal Masud .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (02) :857-877
[3]  
Bezdek J.L, 2008, IEEE INT C DAT MIN I
[4]   VAT: A tool for visual assessment of (cluster) tendency [J].
Bezdek, JC ;
Hathaway, RJ .
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :2225-2230
[5]   Comparative Performance Evaluation of Clustering Algorithms for Grouping Manufacturing Firms [J].
Bhatnagar, Vikas ;
Majhi, Ritanjali ;
Jena, Pradyot Ranjan .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (08) :4071-4083
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]   Crisp partitions induced by a fuzzy set [J].
Bodjanova, Slavka .
Data Science and Classification, 2006, :75-82
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57