Social Media Analysis using Optimized K-Means Clustering

被引:0
作者
Alsayat, Ahmed [1 ]
El-Sayed, Hoda [1 ]
机构
[1] Bowie State Univ, Dept Comp Sci, Bowie, MD 20715 USA
来源
2016 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS (SERA) | 2016年
关键词
K-Means; Genetic Algorithm; Clustering; Social Media Analysis; DataMining;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The increasing influence of social media and enormous participation of users creates new opportunities to study human social behavior along with the capability to analyze large amount of data streams. One of the interesting problems is to distinguish between different kinds of users, for example users who are leaders and introduce new issues and discussions on social media. Furthermore, positive or negative attitudes can also be inferred from those discussions. Such problems require a formal interpretation of social media logs and unit of information that can spread from person to person through the social network. Once the social media data such as user messages are parsed and network relationships are identified, data mining techniques can be applied to group different types of communities. However, the appropriate granularity of user communities and their behavior is hardly captured by existing methods. In this paper, we present a framework for the novel task of detecting communities by clustering messages from large streams of social data. Our framework uses K-Means clustering algorithm along with Genetic algorithm and Optimized Cluster Distance (OCD) method to cluster data. The goal of our proposed framework is twofold that is to overcome the problem of general K-Means for choosing best initial centroids using Genetic algorithm, as well as to maximize the distance between clusters by pairwise clustering using OCD to get an accurate clusters. We used various cluster validation metrics to evaluate the performance of our algorithm. The analysis shows that the proposed method gives better clustering results and provides a novel use-case of grouping user communities based on their activities. Our approach is optimized and scalable for real-time clustering of social media data.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 26 条
  • [1] [Anonymous], 2011, Social network analysis
  • [2] [Anonymous], 2009, METHODOLOGY-EUR
  • [3] [Anonymous], Proceedings of the 20th international conference on World wide web, DOI DOI 10.1145/1963405.1963504
  • [4] [Anonymous], 2011, Everyone is an influencer: Quantifying influence on twitter, DOI DOI 10.1145/1935826.1935845
  • [5] [Anonymous], 2008, P 17 INT C WORLD WID
  • [6] Social media analytics: a survey of techniques, tools and platforms
    Batrinca, Bogdan
    Treleaven, Philip C.
    [J]. AI & SOCIETY, 2015, 30 (01) : 89 - 116
  • [7] Berthold MR., 2009, ACM SIGKDD Explor. Newsl., V11, P26, DOI [DOI 10.1145/1656274.1656280, DOI 10.1145/1656274]
  • [8] Twitter mood predicts the stock market
    Bollen, Johan
    Mao, Huina
    Zeng, Xiaojun
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2011, 2 (01) : 1 - 8
  • [9] Burkardt John., 2009, K-means clustering
  • [10] Computational social science
    Cioffi-Revilla, Claudio
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03): : 259 - 271