Discovering statistically non-redundant subgroups

被引：15

作者：

Li, Jiuyong ^{[1
]}

Liu, Jixue ^{[1
]}

Toivonen, Hannu ^{[2
]}

Satou, Kenji ^{[3
]}

Sun, Youqiang ^{[4
,5
]}

Sun, Bingyu ^{[4
]}

机构：

[1] Univ S Australia, Sch Informat Technol & Math Sci, Adelaide, SA 5001, Australia

[2] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland

[3] Kanazawa Univ, Grad Sch Nat Sci & Technol, Kanazawa, Ishikawa 9201192, Japan

[4] Chinese Acad Sci, Inst Intelligent Machines, Beijing 100864, Peoples R China

[5] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Anhui, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2014年 / 67卷

基金：

澳大利亚研究理事会; 芬兰科学院;

关键词：

Subgroups; Non-redundancy; Odds ratio; Rules; Search space pruning; ASSOCIATION RULES; CONTRAST SET; ALGORITHM; PATTERN; SD;

D O I：

10.1016/j.knosys.2014.04.030

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：315 / 327

页数：13

共 41 条

[1] [Anonymous], 2000, Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS'00)
[2] [Anonymous], 2003, Statistical Methods for Rates and Proportions
[3] [Anonymous], 2007, Uci machine learning repository
[4] [Anonymous], ADV KNOWLEDGE DISCOV
[5] Atzmueller M, 2006, LECT NOTES ARTIF INT, V4213, P6
[6] Baumeister J., 2006, INT FLOR ART INT RES, P402
[7] Detecting group differences: Mining contrast sets
Bay, SD
Pazzani, MJ
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) : 213 - 246
[8] Constraint-based rule mining in large, dense databases
Bayardo, RJ
Agrawal, R
Gunopulos, D
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) : 217 - 240
[9] Boley M, 2009, LECT NOTES ARTIF INT, V5781, P179, DOI 10.1007/978-3-642-04180-8_29
[10] Borgelt C., 2003, P IEEE ICDM WORKSH F, P24

← 1 2 3 4 5 →