Detecting phishing e-mails using Text and Data mining

被引:0
作者
Pandey, Mayank [1 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Hyderabad, Andhra Pradesh, India
来源
2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC) | 2012年
关键词
Multilayer Perceptron; Decision Tree; Logistic regression; Support Vector Machine; Group Method Of Data Handling; Phishing webpage; Probabilistic Neural Network; Genetic Programming; Text mining; Classification; ATTACKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields 'if-then' rules, thereby increasing the comprehensibility of the system.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 50 条
  • [21] Detecting and Classifying Crimes from Arabic Twitter Posts using Text Mining Techniques
    Al-Saif, Hissah
    Al-Dossari, Hmood
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (10) : 377 - 387
  • [22] Interdependence of Text Mining Quality and the Input Data Preprocessing
    Darena, Frantisek
    Zizka, Jan
    ARTIFICIAL INTELLIGENCE PERSPECTIVES AND APPLICATIONS (CSOC2015), 2015, 347 : 141 - 150
  • [23] Application of Text Mining in Detecting Evidence of Fraud in Text Documents
    Silva, Elcelina
    2017 12TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2017,
  • [24] Data Mining and Text Mining - A Survey
    Suresh, R.
    Harshni, S. R.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY INFORMATION AND COMMUNICATION (ICCPEIC), 2017, : 412 - 419
  • [25] Phishing detection based Associative Classification data mining
    Abdelhamid, Neda
    Ayesh, Aladdin
    Thabtah, Fadi
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5948 - 5959
  • [26] Applying text and data mining techniques to forecasting the trend of petitions filed to e-People
    Suh, Jong Hwan
    Park, Chung Hoon
    Jeon, Si Hyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 7255 - 7268
  • [27] Text Mining Technique for Data Mining Application
    Govindarajan, M.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 544 - 549
  • [28] Data Analysis Support by Combining Data Mining and Text Mining
    Matsumoto, Tomoya
    Sunayama, Wataru
    Hatanaka, Yuji
    Ogohara, Kazunori
    2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 313 - 318
  • [29] Progress Towards a Smarter Office via a Novel Intelligent System for Message Organisation by Unifying E-Mails & Phone Calls
    Hunter, Gordon
    Denholm-Price, James
    Michel, Thomas
    Yardley, John
    Fox, David
    WORKSHOP PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS, 2015, 19 : 15 - 26
  • [30] Document Classification of Filipino Online Scam Incident Text using Data Mining Techniques
    Palad, Eddie Bouy B.
    Tangkeko, Marivic S.
    Magpantay, Lissa Andrea K.
    Sipin, Glenn L.
    ISCIT 2019: PROCEEDINGS OF 2019 19TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2019, : 232 - 237