Detecting phishing e-mails using Text and Data mining

被引：0

作者：

Pandey, Mayank ^{[1
]}

Ravi, Vadlamani ^{[1
]}

机构：

[1] Inst Dev & Res Banking Technol, Hyderabad, Andhra Pradesh, India

来源：

2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC) | 2012年

关键词：

Multilayer Perceptron; Decision Tree; Logistic regression; Support Vector Machine; Group Method Of Data Handling; Phishing webpage; Probabilistic Neural Network; Genetic Programming; Text mining; Classification; ATTACKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields 'if-then' rules, thereby increasing the comprehensibility of the system.

引用

页码：249 / 254

页数：6

共 50 条

[1] Using text classification and multiple concepts to answer e-mails
Weng, SS
Liu, CK
EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 529 - 543
[2] Exposing the Phish: The Effect of Persuasion Techniques in Phishing E-Mails
Koddebusch, Michael
PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2022: Intelligent Technologies, Governments and Citizens, 2022, : 78 - 87
[3] Mining writeprints from anonymous e-mails for forensic investigation
Iqbal, Farkhund
Binsalleeh, Hamad
Fung, Benjamin C. M.
Debbabi, Mourad
DIGITAL INVESTIGATION, 2010, 7 (1-2) : 56 - 64
[4] An ensemble approach applied to classify spam e-mails
Ying, Kuo-Ching
Lin, Shih-Wei
Lee, Zne-Jung
Lin, Yen-Tim
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) : 2197 - 2201
[5] Detecting Spam E-mails with Content and Weight-based Binomial Logistic Model
Indu, Richa
Dimri, Sushil Chandra
JOURNAL OF WEB ENGINEERING, 2023, 22 (07): : 939 - 959
[6] Trends in Combating Image Spam E-mails
Khanum, Mohammadi Akheela
Ketari, Lamia Mohammed
FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 78 - 84
[7] A Systematic Review: Detecting Phishing Websites Using Data Mining Models
Jibat D.
Jamjoom S.
Al-Haija Q.A.
Qusef A.
Intelligent and Converged Networks, 2023, 4 (04): : 326 - 341
[8] Social network based filtering of unsolicited messages from e-mails
Kiliroor, Cinu C.
Valliyammai, C.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4037 - 4048
[9] Text and Data Mining to Detect Phishing Websites and Spam Emails
Pandey, Mayank
Ravi, Vadlamani
SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, PT II (SEMCCO 2013), 2013, 8298 : 559 - 573
[10] Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts
Nishanth, Kancherla Jonah
Ravi, Vadlamani
Ankaiah, Narravula
Bose, Indranil
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (12) : 10583 - 10589

← 1 2 3 4 5 →