Ensemble based spam detection in social loT using probabilistic data structures

被引:24
作者
Singh, Amritpal [1 ]
Batra, Shalini [1 ]
机构
[1] Thapar Univ, Patiala 147001, Punjab, India
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 81卷
关键词
Spam detection; Tweet classification; Ensemble model; Quotient Filter; Locality Sensitive Hashing; INTERNET;
D O I
10.1016/j.future.2017.09.072
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A social approach can be used for the Internet of Things (IoT) to connect large number of objects in social networks like Twitter, Facebook, Instagram, etc. Social networks within the loT domain have simplified the task of dynamic discovery of services and information. Detecting spam in social media, especially when massive data flows continuously and large number of attributes are associated with it, is a daunting task which requires lot of technical insight. This paper proposes a semi-supervised technique for spam detection in Twitter by employing ensemble based framework comprising of four classifiers. The framework is based on usage of Probabilistic Data Structures (PDS) like Quotient Filter (QF) to query the URL database, spam users, spam words databases and Locality Sensitive Hashing (LSH) for similarity search, as classifiers in various stages which provide fast results with less computational effort. Performance of the framework has been evaluated by comparative analysis of PDS with the similar data structures and through the standard evaluation parameters which include precision, recall and F-score. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:359 / 371
页数:13
相关论文
共 38 条