Subjective interestingness of subgraph patterns

被引：24

作者：

van Leeuwen, Matthijs ^{[1
,2
]}

De Bie, Tijl ^{[3
,4
]}

Spyropoulou, Eirini ^{[3
]}

Mesnage, Cedric ^{[3
]}

机构：

[1] Katholieke Univ Leuven, Dept Comp Sci, Machine Learning, Leuven, Belgium

[2] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands

[3] Univ Bristol, Intelligent Syst Lab, Bristol, Avon, England

[4] Univ Ghent, Data Sci Lab, Ghent, Belgium

来源：

MACHINE LEARNING | 2016年 / 105卷 / 01期

基金：

英国工程与自然科学研究理事会; 欧洲研究理事会;

关键词：

Dense subgraph patterns; Community detection; Subjective interestingness; Maximum entropy; DISCOVERY;

D O I：

10.1007/s10994-015-5539-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The utility of a dense subgraph in gaining a better understanding of a graph has been formalised in numerous ways, each striking a different balance between approximating actual interestingness and computational efficiency. A difficulty in making this trade-off is that, while computational cost of an algorithm is relatively well-defined, a pattern's interestingness is fundamentally subjective. This means that this latter aspect is often treated only informally or neglected, and instead some form of density is used as a proxy. We resolve this difficulty by formalising what makes a dense subgraph pattern interesting to a given user. Unsurprisingly, the resulting measure is dependent on the prior beliefs of the user about the graph. For concreteness, in this paper we consider two cases: one case where the user only has a belief about the overall density of the graph, and another case where the user has prior beliefs about the degrees of the vertices. Furthermore, we illustrate how the resulting interestingness measure is different from previous proposals. We also propose effective exact and approximate algorithms for mining the most interesting dense subgraph according to the proposed measure. Usefully, the proposed interestingness measure and approach lend themselves well to iterative dense subgraph discovery. Contrary to most existing approaches, our method naturally allows subsequently found patterns to be overlapping. The empirical evaluation highlights the properties of the new interestingness measure given different prior belief sets, and our approach's ability to find interesting subgraphs that other methods are unable to find.

引用

页码：41 / 75

页数：35

共 28 条

[1] Massive quasi-clique detection
Abello, J
Resende, MGC
Sudarsky, S
[J]. LATIN 2002: THEORETICAL INFORMATICS, 2002, 2286 : 598 - 612
[2] [Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data, DOI [DOI 10.1145/1217299.1217303, 10.1145/1217299.1217303]
[3] Bhuiyan M., 2012, Proceedings of the International Conference on Information and Knowledge Management, P95
[4] Bie T.D., 2011, P SIGKDD, P564, DOI [10.1145/2020408.2020497, DOI 10.1145/2020408.2020497]
[5] Boley M, 2013, P ACM SIGKDD WORKSH, P27, DOI DOI 10.1145/2501511.2501517
[6] Boley Mario, 2011, P 17 ACM SIGKDD INT, P582, DOI DOI 10.1145/2020408.2020500
[7] Boyd S., 2004, Convex optimization, DOI [10.1017/cbo97805118044 41, 10.1017/CBO9780511804441]
[8] A MEASURE OF ASYMPTOTIC EFFICIENCY FOR TESTS OF A HYPOTHESIS BASED ON THE SUM OF OBSERVATIONS
CHERNOFF, H
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (04): : 493 - 507
[9] Cover TM., 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
[10] Maximum entropy models and subjective interestingness: an application to tiles in binary databases
De Bie, Tijl
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 23 (03) : 407 - 446

← 1 2 3 →