An Adaptive Three-way Spam Filter with Similarity Measure

被引:0
作者
Xie Q. [1 ]
Zhang Q. [1 ]
Wang G. [1 ]
机构
[1] Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2019年 / 56卷 / 11期
基金
中国国家自然科学基金;
关键词
Decision-theoretic rough sets; Similarity measure; Spam filtering; Three-way decisions; Thresholds;
D O I
10.7544/issn1000-1239.2019.20180793
中图分类号
学科分类号
摘要
Spam filtering is an important issue in the information age. And, if an important email is wrongly classified, it would lead to an immeasurable cost. Thus, in the field of spam filtering, the ways to improve the accuracy and recall of the filters is the key issue. At present, the binary classification model in machine learning is usually used to deal with spam filtering. However, compared with the three-way decisions, the binary classification model usually leads to a higher cost of misclassification. And, as an important branch of three-way decisions, the three-way decisions with decision-theoretic rough sets can effectively reduce the misclassification cost and further improve the performance of filters. And, it also conforms to human cognition. Nevertheless, few studies consider the effect on classification results induced by the differences among equivalence classes when constructing the loss functions. Therefore, under the framework of the three-way decisions with decision-theoretic rough sets, an adaptive three-way spam filter with similarity measure is proposed. The model calculates the weights of condition attributes according to set variance firstly. Then, a comprehensive evaluation function for describing difference information among equivalence classes based on similarity measure of set is established. Finally, an adaptive model for calculating threshold pairs based on Bayesian decision rules is constructed. Experimental results show that the proposed model performs well in the field of spam filtering. © 2019, Science Press. All right reserved.
引用
收藏
页码:2410 / 2423
页数:13
相关论文
共 37 条
[1]  
Androutsopoulos I., Paliouras G., Karkaletsis V., Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach, Computer Science, 97, 2, pp. 1-13, (2012)
[2]  
Sahami M., Dumais S., Heckerman D., Et al., A Bayesian approach to filtering junk e-mail, Proc of Learning for Text Categorization, pp. 98-105, (1998)
[3]  
Schapire R.E., Singer Y., Boostexter: A boosting-based system for text categorization, Machine Learning, 39, 2-3, pp. 135-168, (2000)
[4]  
Yih W., McCann R., Kolcz A., Improving spam filtering by detecting gray mail, Proc of the 4th Conf on Email and Anti-Spam, (2007)
[5]  
Wang G., Yao Y., Yu H., A survey on rough set theory and applications, Chinese Journal of Computers, 32, 7, pp. 1229-1246, (2009)
[6]  
Pawlak Z., Polkowski L., Skowron A., Rough sets, International Journal of Computer & Information Sciences, 11, 5, pp. 341-356, (1982)
[7]  
Yao Y., Decision-theoretic rough set models, Proc of Int Conf on Rough Sets and Knowledge Technology, pp. 14-16, (2007)
[8]  
Yao Y., Wong S.K.M., A decision theoretic framework for approximating concepts, International Journal of Man-Machine Studies, 37, 6, pp. 793-809, (1992)
[9]  
Deng X., Yao Y., Decision-theoretic three-way approximations of fuzzy sets, Information Sciences, 279, pp. 702-715, (2014)
[10]  
Zhang Z., Miao D., Nie J., Et al., Sentiment uncertainty measure and classification of negative sentences, Journal of Computer Research and Development, 52, 8, pp. 1806-1816, (2015)