Model-based exception mining for object-relational data

被引:0
作者
Fatemeh Riahi
Oliver Schulte
机构
[1] Simon Fraser University,
来源
Data Mining and Knowledge Discovery | 2020年 / 34卷
关键词
Outlier detection; Exception mining; Statistical-relational learning; Bayesian network; Likelihood ratio; Network data;
D O I
暂无
中图分类号
学科分类号
摘要
This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.
引用
收藏
页码:681 / 722
页数:41
相关论文
共 52 条
  • [1] Akoglu L(2015)Graph based anomaly detection and description: a survey Data Min Knowl Discov 29 626-688
  • [2] Tong H(2004)Outlier detection by logic programming ACM Trans Comput Logic 9 7-318
  • [3] Koutra D(1994)On the asymptotic normality of the L1-and L2-errors in histogram density estimation Can J Stat 22 309-16
  • [4] Angiulli F(2001)Large deviations of divergence measures on partitions J Stat Plan Inference 93 1-330
  • [5] Greco G(2008)Unsupervised anomaly detection in large databases using Bayesian networks Appl Artif Intell 22 309-2187
  • [6] Palopoli L(2006)A scoring function for learning Bayesian networks based on mutual information and conditional independence tests J Mach Learn Res 7 2149-98
  • [7] Beirlant J(2016)Exceptional model mining Data Min Knowl Discov 30 47-874
  • [8] Györfi L(2006)An introduction to ROC analysis Pattern Recognit Lett 27 861-32
  • [9] Lugosi G(1921)On the probable error of a coefficient of correlation deduced from a small sample Metron 1 3-168
  • [10] Beirlant J(2002)Testing causality between team performance and payroll: the cases of Major League Baseball and English Soccer J Sports Econ 3 149-350