Detecting sexual predators in chats using behavioral features and imbalanced learning

被引:12
作者
Cardei, Claudia [1 ]
Rebedea, Traian [1 ]
机构
[1] Univ Politehn Bucuresti, Dept Comp Sci, Bucharest 060042, Romania
关键词
YOUTH;
D O I
10.1017/S1351324916000395
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a system developed for detecting sexual predators in online chat conversations using a two-stage classification and behavioral features. A sexual predator is defined as a person who tries to obtain sexual favors in a predatory manner, usually with underage people. The proposed approach uses several text categorization methods and empirical behavioral features developed especially for the task at hand. After investigating various approaches for solving the sexual predator identification problem, we have found that a two-stage classifier achieves the best results. In the first stage, we employ a Support Vector Machine classifier to distinguish conversations having suspicious content from safe online discussions. This is useful as most chat conversations in real life do not contain a sexual predator, therefore it can be viewed as a filtering phase that enables the actual detection of predators to be done only for suspicious chats that contain a sexual predator with a very high degree. In the second stage, we detect which of the users in a suspicious discussion is an actual predator using a Random Forest classifier. The system was tested on the corpus provided by the PAN 2012 workshop organizers and the results are encouraging because, as far as we know, our solution outperforms all previous approaches developed for solving this task.
引用
收藏
页码:589 / 616
页数:28
相关论文
共 35 条
[1]  
[Anonymous], P CLEF 2012 ONL WORK
[2]  
[Anonymous], P CLEF 2012 ONL WORK
[3]  
[Anonymous], P CLEF 2012 ONL WORK
[4]  
[Anonymous], P CLEF 2012 ONL WORK
[5]  
[Anonymous], P ICML 2003 WORKSH L
[6]  
[Anonymous], P CLEF 2012 ONL WORK
[7]  
[Anonymous], J ADOLESCENT HLTH
[8]  
[Anonymous], 2004, ACM SIGKDD EXPLORATI, DOI DOI 10.1145/1007730.1007737
[9]  
[Anonymous], P CLEF 2012 ONL WORK
[10]  
[Anonymous], SECUR INFORM