Email Classification Research Trends: Review and Open Issues

被引:0
作者
Mujtaba, Ghulam [1 ,2 ]
Shuib, Liyana [1 ]
Raj, Ram Gopal [3 ]
Majeed, Nahdia [2 ]
Al-Garadi, Mohammed Ali [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
[2] Sukkur Inst Business Adm, Dept Comp Sci, Sukkur 65200, Pakistan
[3] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Artificial Intelligence, Kuala Lumpur 50603, Malaysia
来源
IEEE ACCESS | 2017年 / 5卷
关键词
Email classification; spam detection; phishing detection; multi-folder categorization; machine learning techniques; E-MAIL CLASSIFICATION; FEATURE-SELECTION; SPAM; ANALYZER; FEATURES; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile applications, and social networks. As the volume of business-critical e-mails continues to grow, the need to automate the management of e-mails increases for several reasons, such as spam e-mail classification, phishing e-mail classification, and multi-folder categorization, among others. This paper comprehensively reviews articles on e-mail classification published in 2006-2016 by exploiting the methodological decision analysis in five aspects, namely, e-mail classification application areas, data sets used in each application area, feature space utilized in each application area, e-mail classification techniques, and the use of performance measures. A total of 98 articles (56 articles from Web of Science core collection databases and 42 articles from Scopus database) are selected. To achieve the objective of the study, a comprehensive review and analysis is conducted to explore the various areas where e-mail classification was applied. Moreover, various public data sets, features sets, classification techniques, and performance measures are examined and used in each identified application area. This review identifies five application areas of e-mail classification. The most widely used data sets, features sets, classification techniques, and performance measures are found in the identified application areas. The extensive use of these popular data sets, features sets, classification techniques, and performance measures is discussed and justified. The research directions, research challenges, and open issues in the field of e-mail classification are also presented for future researchers.
引用
收藏
页码:9044 / 9064
页数:21
相关论文
共 120 条
  • [1] Abu-Nimeh S., 2007, P ANT WORK GROUPS 2, P60, DOI DOI 10.1145/1299015.1299021
  • [2] Classification of Phishing Email Using Random Forest Machine Learning Technique
    Akinyelu, Andronicus A.
    Adewumi, Aderemi O.
    [J]. JOURNAL OF APPLIED MATHEMATICS, 2014,
  • [3] Al Fe'ar N., 2008, E CLASSIFIER BILINGU
  • [4] Al Sallab Ahmad A., 2012, Journal of Theoretical and Applied Information Technology, V37, P241
  • [5] Email pragmatics and automatic classification: A study in the organizational context
    Alberts, Inge
    Forest, Dominic
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (05): : 904 - 922
  • [6] A Survey of Phishing Email Filtering Techniques
    Almomani, Ammar
    Gupta, B. B.
    Atawneh, Samer
    Meulenberg, A.
    Almomani, Eman
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2013, 15 (04): : 2070 - 2090
  • [7] [Anonymous], 2016, NAT METHODS, DOI DOI 10.1038/nmeth.3707
  • [8] [Anonymous], P 46 ANN SE REG C 20
  • [9] Data mining based intelligent analysis of threatening e-mail
    Appavu, S.
    Rajaram, R.
    Muthupandian, M.
    Athiappan, G.
    Kashmeera, K. S.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) : 392 - 393
  • [10] Ayodele Taiwo, 2007, IET Conference on Wireless, Mobile and Sensor Networks 2007 (CCWMSN07), P805, DOI 10.1049/cp:20070271