共 48 条
Catching the drift: Using feature-free case-based reasoning for spam filtering
被引:0
|作者:
Delany, Sarah Jane
[1
]
Bridge, Derek
[2
]
机构:
[1] Dublin Inst Technol, Dublin, Ireland
[2] Univ Coll Cork, Cork, Ireland
来源:
CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS
|
2007年
/
4626卷
关键词:
TRACKING CONCEPT DRIFT;
SELECTION;
D O I:
暂无
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
In this paper, we compare case-based spam filters, focusing on their resilience to concept drift. In particular, we evaluate how to track concept drift using a case-based spam filter that uses a feature-free distance measure based on text compression. In our experiments, we compare two ways to normalise such a distance measure, finding that the one proposed in [1] performs better. We show that a policy as simple as retaining misclassified examples has a hugely beneficial effect on handling concept drift in spam but, on its own, it results in the case base growing by over 30%. We then compare two different retention policies and two different forgetting policies (one a form of instance selection, the other a form of instance weighting) and find that they perform roughly as well as each other while keeping the case base size constant. Finally, we compare a feature-based textual case-based spam filter with our feature-free approach. In the face of concept drift, the feature-based approach requires the case base to be rebuilt periodically so that we can select a new feature set that better predicts the target concept. We find feature-free approaches to have lower error rates than their feature-based equivalents.
引用
收藏
页码:314 / +
页数:4
相关论文