Discovering statistically non-redundant subgroups

被引:15
作者
Li, Jiuyong [1 ]
Liu, Jixue [1 ]
Toivonen, Hannu [2 ]
Satou, Kenji [3 ]
Sun, Youqiang [4 ,5 ]
Sun, Bingyu [4 ]
机构
[1] Univ S Australia, Sch Informat Technol & Math Sci, Adelaide, SA 5001, Australia
[2] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
[3] Kanazawa Univ, Grad Sch Nat Sci & Technol, Kanazawa, Ishikawa 9201192, Japan
[4] Chinese Acad Sci, Inst Intelligent Machines, Beijing 100864, Peoples R China
[5] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Anhui, Peoples R China
基金
澳大利亚研究理事会; 芬兰科学院;
关键词
Subgroups; Non-redundancy; Odds ratio; Rules; Search space pruning; ASSOCIATION RULES; CONTRAST SET; ALGORITHM; PATTERN; SD;
D O I
10.1016/j.knosys.2014.04.030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify the statistical difference of two groups for a certain outcome and it is a very suitable measure for quantifying the quality of subgroups. In this paper, we propose a statistically sound framework for statistically non-redundant subgroup discovery: measuring the quality of subgroups by the odds ratio and defining statistically non-redundant subgroups by the error bounds of odds ratios. We show that our proposed method is faster than most existing methods and discovers complete statistically non-redundant subgroups. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:315 / 327
页数:13
相关论文
共 41 条
  • [1] [Anonymous], 2000, Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS'00)
  • [2] [Anonymous], 2003, Statistical Methods for Rates and Proportions
  • [3] [Anonymous], 2007, Uci machine learning repository
  • [4] [Anonymous], ADV KNOWLEDGE DISCOV
  • [5] Atzmueller M, 2006, LECT NOTES ARTIF INT, V4213, P6
  • [6] Baumeister J., 2006, INT FLOR ART INT RES, P402
  • [7] Detecting group differences: Mining contrast sets
    Bay, SD
    Pazzani, MJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) : 213 - 246
  • [8] Constraint-based rule mining in large, dense databases
    Bayardo, RJ
    Agrawal, R
    Gunopulos, D
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) : 217 - 240
  • [9] Boley M, 2009, LECT NOTES ARTIF INT, V5781, P179, DOI 10.1007/978-3-642-04180-8_29
  • [10] Borgelt C., 2003, P IEEE ICDM WORKSH F, P24