Mining significant association rules from uncertain data

被引：8

作者：

Zhang, Anshu ^{[1
]}

Shi, Wenzhong ^{[1
]}

Webb, Geoffrey I. ^{[2
]}

机构：

[1] Hong Kong Polytech Univ, Dept Land Surveying & Geoinformat, Kowloon, Hong Kong, Peoples R China

[2] Monash Univ, Fac Informat Technol, Melbourne, Vic 3800, Australia

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2016年 / 30卷 / 04期

关键词：

Pattern discovery; Association rules; Statistical evaluation; Uncertain data; ACCURACY;

D O I：

10.1007/s10618-015-0446-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.

引用

页码：928 / 963

页数：36

共 45 条

[1]

Aggarwal CC, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P29

[2]

Agrawal Rakesh, 1993, P ACM SIGMOD C MAN D, P207, DOI DOI 10.1145/170035.170072

[3]

Agresti A., 1992, STAT SCI, V7, P131, DOI [10.1214/ss/1177011454, DOI 10.1214/SS/1177011454]

[4]

[Anonymous], 2014, Handbook of Biological Statistics Internet

[5]

[Anonymous], 1949, The American Mathematical Monthly, DOI DOI 10.2307/2305561

[6]

Bastide I, 2000, LECT NOTES ARTIF INT, V1861, P972

[7] Detecting group differences: Mining contrast sets [J].

Bay, SD ;

Pazzani, MJ .

DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) :213-246

[8] Constraint-based rule mining in large, dense databases [J].

Bayardo, RJ ;

Agrawal, R ;

Gunopulos, D .

DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) :217-240

[9]

Ben-Israel A., 2003, Generalized inverses: theory and applications, V15

[10]

Bing Liu, 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P329

← 1 2 3 4 5 →