Statistical and structural approaches to filtering Internet pornography

被引:0
|
作者
Ho, WH [1 ]
Watters, PA [1 ]
机构
[1] Macquarie Univ, Div ICS, Postgrad Prof Dev Program, Sydney, NSW 2109, Australia
来源
2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7 | 2004年
关键词
pornography; content filtering; text mining; simple keyword search; Bayesian classification;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The WWW is a major source of unintentional exposure to pornography. Current content filtering technology using blacklisting or simple keyword searching is ineffective - today's filters have many false positives and negatives, and require tedious manual updating. This study examined how content filtering of pornographic web page text, based on structural and statistical analysis, could greatly improve accuracy. Systematic differences between pornographic and nonpornographic web pages were found, with Bayesion classification yielding 99.1% accuracy in text classification from pornographic and non-pornographic corpora.
引用
收藏
页码:4792 / 4798
页数:7
相关论文
共 50 条