Spammer Classification Using Ensemble Methods over Content-Based Features

被引:8
作者
Makkar, Aaisha [1 ]
Goel, Shivani [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
来源
PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2016, VOL 2 | 2017年 / 547卷
关键词
Web spamming; Machine learning; Boosting; Ensemble;
D O I
10.1007/978-981-10-3325-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the web documents are raising at high scale, it is very difficult to access useful information. Search engines play a major role in retrieval of relevant information and knowledge. They deal with managing large amount of information with efficient page ranking algorithms. Still web spammers try to intrude the search engine results by various web spamming techniques for their personal benefit. According to the recent report from Internetlivestats in March (2016), an Internet survey company, states that there are currently 3.4 billion Internet users in the world. From this survey it can be judged that the search engines play a vital role in retrieval of information. In this research, we have investigated fifteen different machine learning classification algorithms over content based features to classify the spam and non spam web pages. Ensemble approach is done by using three algorithms which are computed as best on the basis of various parameters. Ten Fold Cross-validation approach is also used.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 13 条
  • [1] [Anonymous], 2012, ACM SIGKDD Explorations Newsletter, DOI [DOI 10.1145/2207243.2207252, 10.1145/2207243.2207252]
  • [2] Basavaraju M., 2010, INT J COMPUT APPL, V5, P15, DOI [10.5120/906-1283, DOI 10.5120/906-1283]
  • [3] Link Analysis for Web Spam Detection
    Becchetti, Luca
    Castillo, Carlos
    Donato, Debora
    Baeza-Yates, Ricardo
    Leonardi, Stefano
    [J]. ACM TRANSACTIONS ON THE WEB, 2008, 2 (01)
  • [4] Bhattarai A, 2009, IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN CYBER SECURITY, P37
  • [5] DistanceRank: An intelligent ranking algorithm for web pages
    Bidoki, Ali Mohammad Zareh
    Yazdani, Nasser
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (02) : 877 - 892
  • [6] Gyongyi Z., 2005, The 14th International World Wide Web Conference: Proceedings of the First International Workshop on Adversarial Information Retrieval (AIRWeb), P1
  • [7] Gyongyi Z., 2006, P 32 INT C VER LARG, P439
  • [8] Kleinberg J. M., 1999, Computing and Combinatorics. 5th Annual International Conference, COCOON'99. Proceedings (Lecture Notes in Computer Science Vol.1627), P1
  • [9] The stochastic approach for link-structure analysis (SALSA) and the TKC effect
    Lempel, R
    Moran, S
    [J]. COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6): : 387 - 401
  • [10] Sangeetha M., 2014, ICICES, P1