Using GMDH-based networks for improved spam detection and email feature analysis

被引:22
|
作者
El-Alfy, El-Sayed M. [1 ]
Abdel-Aal, Radwan E. [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Coll Comp Sci & Engn, Dhahran 31261, Saudi Arabia
关键词
Group method of data handling; GMDH-based networks; Soft computing; Neural networks; Bayesian classifiers; Spam detection; Spam filtering; Feature selection; Network ensembles; Network committees; CLASSIFICATION;
D O I
10.1016/j.asoc.2009.12.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spam-base, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naive Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:477 / 488
页数:12
相关论文
共 50 条
  • [1] GMDH-based networks for intelligent intrusion detection
    Baig, Zubair A.
    Sait, Sadiq M.
    Shaheen, AbdulRahman
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (07) : 1731 - 1740
  • [2] GMDH-based feature ranking and selection for improved classification of medical data
    Abdel-Aal, RE
    JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) : 456 - 468
  • [3] Improving Email Spam Detection Using Content Based Feature Engineering Approach
    Hijawi, Wadi'
    Faris, Hossam
    Alqatawna, Ja'far
    Al-Zoubi, Ala' M.
    Aljarah, Ibrahim
    2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [4] A Novel Approach for Face Recognition Using Fused GMDH-Based Networks
    El-Alfy, El-Sayed
    Baig, Zubair
    Abdel-Aal, Radwan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (03) : 369 - 377
  • [5] Email Spam Detection Using Machine Learning and Feature Optimization Method
    Grewal, Naseeb
    Nijhawan, Rahul
    Mittal, Ankush
    DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 435 - 447
  • [6] Feature Selection Using Hybrid Metaheuristic Algorithm for Email Spam Detection
    Al-Rawashdeh, Ghada Hammad
    Khashan, Osama A.
    Al-Rawashde, Jawad
    Al-Gasawneh, Jassim Ahmad
    Alsokkar, Abdullah
    Alshinwa, Mohammad
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2024, 24 (02) : 156 - 171
  • [7] Detection of Zombie PCs Based on Email Spam Analysis
    Jeong, HyunCheol
    Kim, Huy Kang
    Lee, Sangjin
    Kim, Eunjin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (05): : 1445 - 1462
  • [8] Improved machine learning technique for feature reduction and its application in spam email detection
    Ewees, Ahmed A.
    Gaheen, Marwa A.
    Alshahrani, Mohammed M.
    Anter, Ahmed M.
    Ismail, Fatma H.
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (06) : 1749 - 1771
  • [9] GMDH-Based Outlier Detection Model in Classification Problems
    Xie, Ling
    Jia, Yanlin
    Xiao, Jin
    Gu, Xin
    Huang, Jing
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (05) : 1516 - 1532
  • [10] Improved email spam detection model based on support vector machines
    Sunday Olusanya Olatunji
    Neural Computing and Applications, 2019, 31 : 691 - 699