Detecting phishing e-mails using Text and Data mining

被引:0
|
作者
Pandey, Mayank [1 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Hyderabad, Andhra Pradesh, India
来源
2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC) | 2012年
关键词
Multilayer Perceptron; Decision Tree; Logistic regression; Support Vector Machine; Group Method Of Data Handling; Phishing webpage; Probabilistic Neural Network; Genetic Programming; Text mining; Classification; ATTACKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields 'if-then' rules, thereby increasing the comprehensibility of the system.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 50 条
  • [1] Using text classification and multiple concepts to answer e-mails
    Weng, SS
    Liu, CK
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 529 - 543
  • [2] Exposing the Phish: The Effect of Persuasion Techniques in Phishing E-Mails
    Koddebusch, Michael
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2022: Intelligent Technologies, Governments and Citizens, 2022, : 78 - 87
  • [3] Mining writeprints from anonymous e-mails for forensic investigation
    Iqbal, Farkhund
    Binsalleeh, Hamad
    Fung, Benjamin C. M.
    Debbabi, Mourad
    DIGITAL INVESTIGATION, 2010, 7 (1-2) : 56 - 64
  • [4] An ensemble approach applied to classify spam e-mails
    Ying, Kuo-Ching
    Lin, Shih-Wei
    Lee, Zne-Jung
    Lin, Yen-Tim
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) : 2197 - 2201
  • [5] Detecting Spam E-mails with Content and Weight-based Binomial Logistic Model
    Indu, Richa
    Dimri, Sushil Chandra
    JOURNAL OF WEB ENGINEERING, 2023, 22 (07): : 939 - 959
  • [6] Trends in Combating Image Spam E-mails
    Khanum, Mohammadi Akheela
    Ketari, Lamia Mohammed
    FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 78 - 84
  • [7] A Systematic Review: Detecting Phishing Websites Using Data Mining Models
    Jibat D.
    Jamjoom S.
    Al-Haija Q.A.
    Qusef A.
    Intelligent and Converged Networks, 2023, 4 (04): : 326 - 341
  • [8] Social network based filtering of unsolicited messages from e-mails
    Kiliroor, Cinu C.
    Valliyammai, C.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4037 - 4048
  • [9] Text and Data Mining to Detect Phishing Websites and Spam Emails
    Pandey, Mayank
    Ravi, Vadlamani
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, PT II (SEMCCO 2013), 2013, 8298 : 559 - 573
  • [10] Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts
    Nishanth, Kancherla Jonah
    Ravi, Vadlamani
    Ankaiah, Narravula
    Bose, Indranil
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (12) : 10583 - 10589