A Multi-Attributed Graph-Based Approach for Text Data Modeling and Event Detection in Twitter

被引:0
作者
Abulaish, Muhmmad [1 ]
Sharma, Sielvie [2 ]
Fazil, Mohd [3 ]
机构
[1] South Asian Univ, Dept Comp Sci, Delhi, India
[2] Jamia Millia Islamia, Dept Comp Engn, Delhi, India
[3] Jamia Millia Islamia, Dept Comp Sci, Delhi, India
来源
2019 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS) | 2019年
关键词
Social Network Analysis; Text Data Modeling; Word2Vec; Multi-Attributed Social Graph; Markov Clustering; Event Detection;
D O I
10.1109/comsnets.2019.8711451
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The popularity of the microblogging sites like Twitter is increasing exponentially in which users are allowed to post short messages (aka tweets) using a maximum of 280 characters, mainly for news sharing and events updates. Besides, textual data, Twitter data also contains multi-dimensional connections among the users if they follow each other or have common followers/followees. Similarly, multi-dimensional connections exist among the tweets if they contain common hashtags, mentions, etc. In recent years, Word2Vec is being extensively used to analyze textual data, and it has shown promising results in many domains. In this paper, we propose a multi-attributed graph-based approach for text data modeling and event detection in Twitter. To this end, we generate a multi-attributed social graph (MASG), in which nodes, representing tweets, are labelled with numeric vectors obtained through Word2Vec model, and edges represent the structural relationships among the tweets and they can also be labeled with numeric vectors. Thereafter, MASG is converted into a similarity graph using a distance function, and Markov clustering algorithm is applied over the similarity graph to identify different clusters, where each cluster corresponds to a particular event. The proposed approach is evaluated over real-world Twitter datasets using standard evaluation metrics including TPR and FPR. It is also compared with some baseline methods and performs significantly better.
引用
收藏
页码:703 / 708
页数:6
相关论文
共 13 条
[1]   A Novel Weighted Distance Measure for Multi-Attributed Graph [J].
Abulaish, Muhammad ;
Jahiruddin .
COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, :39-47
[2]  
Alsaedi N, 2016, 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), P515, DOI [10.1109/WI.2016.86, 10.1109/WI.2016.0087]
[3]   Twitter Data Mining for Events Classification and Analysis [J].
Azam, Nausheen ;
Jahiruddin ;
Abulaish, Muhammad ;
Haldar, Nur Al-Hasan .
2015 SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MACHINE INTELLIGENCE (ISCMI), 2015, :79-83
[4]  
Becker H., 2011, Icwsm
[5]   A Hybrid Approach for Detecting Automated Spammers in Twitter [J].
Fazil, Mohd ;
Abulaish, Muhammad .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (11) :2707-2719
[6]  
Goldberg Y., 2014, WORD2VEC EXPLAINED D
[7]   Real-Time Entity-Based Event Detection for Twitter [J].
McMinn, Andrew J. ;
Jose, Joemon M. .
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, 2015, 9283 :66-78
[8]  
Mikolov T., 2013, ADV NEURAL INFORM PR, V26, P3111
[9]  
Panagiotou N., 2016, Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science, P42
[10]  
Pennington J., 2014, P 2014 C EMP METH NA, P1532