Model-based exception mining for object-relational data

被引:0
作者
Fatemeh Riahi
Oliver Schulte
机构
[1] Simon Fraser University,
来源
Data Mining and Knowledge Discovery | 2020年 / 34卷
关键词
Outlier detection; Exception mining; Statistical-relational learning; Bayesian network; Likelihood ratio; Network data;
D O I
暂无
中图分类号
学科分类号
摘要
This paper develops model-based exception mining and outlier detection for the case of object-relational data. Object-relational data represent a complex heterogeneous network, which comprises objects of different types, links among these objects, also of different types, and attributes of these links. We follow the well-established exceptional model mining (EMM) framework, which has been previously applied for subgroup discovery in propositional data; our novel contribution is to develop EMM for relational data. EMM leverages machine learning models for exception mining: An object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. In relational data, EMM can therefore be used for detecting single outlier or exceptional objects. We combine EMM with state-of-the-art statistical-relational model discovery methods for constructing a graphical model (Bayesian network), that compactly represents probabilistic associations in the data. We investigate several outlierness metrics, based on the learned object-relational model, that quantify the extent to which the association pattern of a potential outlier object deviates from that of the whole population. Our method is validated on synthetic data sets and on real-world data sets about soccer and hockey matches, IMDb movies and mutagenic compounds. Compared to baseline methods, the EMM approach achieved the best detection accuracy when combined with a novel outlinerness metric. An empirical evaluation on soccer and movie data shows a strong correlation between our novel outlierness metric and success metrics: Individuals that our metric marks out as unusual tend to have unusual success.
引用
收藏
页码:681 / 722
页数:41
相关论文
共 52 条
  • [11] Devroye L(1990)An analysis of first-order logics of probability Artif Intell 46 311-80
  • [12] Györfi L(2001)Relational instance-based learning with lists and terms Mach Learn 43 53-45
  • [13] Vajda I(2014)Lifted graphical models: a survey Mach Learn 99 1-4728
  • [14] Cansado A(2012)Outlier detection in relational data: a case study in geographical information systems Expert Syst Appl 39 4718-33
  • [15] Soto A(2016)A review of relational machine learning for knowledge graphs Proc IEEE 104 11-13
  • [16] de Campos L(2014)On the chi square and higher-order Chi distances for approximating f-divergences IEEE Signal Process Lett 21 10-403
  • [17] Duivesteijn W(2009)Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining J Mach Learn Res 10 377-368
  • [18] Feelders AJ(2012)Learning graphical models for relational data via lattice search Mach Learn 88 331-125
  • [19] Knobbe A(2014)Modelling relational statistics with Bayesian networks Mach Learn 94 105-undefined
  • [20] Fawcett T(undefined)undefined undefined undefined undefined-undefined