Mining significant association rules from uncertain data

被引:0
作者
Anshu Zhang
Wenzhong Shi
Geoffrey I. Webb
机构
[1] The Hong Kong Polytechnic University,Department of Land Surveying and Geo
[2] Monash University,Informatics
来源
Data Mining and Knowledge Discovery | 2016年 / 30卷
关键词
Pattern discovery; Association rules; Statistical evaluation; Uncertain data;
D O I
暂无
中图分类号
学科分类号
摘要
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.
引用
收藏
页码:928 / 963
页数:35
相关论文
共 50 条
  • [31] Effect of data skewness in parallel mining of association rules
    Cheung, DW
    Xiao, YQ
    [J]. RESEARCH AND DEVELOPMENT IN KNOWLEDGE DISCOVERY AND DATA MINING, 1998, 1394 : 48 - 60
  • [32] Mining Association Rules for RFID Data with Concept Hierarchy
    Kim, Younghee
    Kim, Ungmo
    Jung, Myungsook
    Kang, Woojun
    Noh, Youngju
    [J]. 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III, PROCEEDINGS,: UBIQUITOUS ICT CONVERGENCE MAKES LIFE BETTER!, 2009, : 1002 - +
  • [33] Mining spatial gene expression data for association rules
    van Hemert, Jano
    Baldock, Richard
    [J]. BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2007, 4414 : 66 - +
  • [34] Efficient algorithm for the extraction of association rules in data mining
    Mitra, Pinaki
    Chaudhuri, Chitrita
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 2, 2006, 3981 : 1 - 10
  • [35] RESEARCH OF DATA MINING ALGORITHM BASED ON ASSOCIATION RULES
    Song, Changxin
    Ma, Ke
    [J]. PROCEEDINGS OF THE 2011 3RD INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION (ICFCC 2011), 2011, : 243 - +
  • [36] Data Mining Technique and Application Based on Association Rules
    Li, Tong
    Cheng, Yuepeng
    Liu, Yuli
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 304 - 306
  • [37] On the mining of association rules in medical image data sets
    Ehikioya, SA
    Olukunle, A
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS: COMPUTER SCI I, 2002, : 17 - 22
  • [38] The application research of Extenics in association rules data mining
    Li, Qingshui
    Jiang, Wenhua
    [J]. FOURTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2011): COMPUTER VISION AND IMAGE ANALYSIS: PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2012, 8350
  • [39] Association Rules Mining over Data Streams: Review
    Tan, Jun
    [J]. ADVANCES IN CIVIL ENGINEERING II, PTS 1-4, 2013, 256-259 : 2890 - 2893
  • [40] Mining Significant Association Rules from on Information and System Quality of Indonesian E-Government Dataset
    Jacob, Deden Witarsyah
    Fudzee, Mohd Farhan Md
    Salamat, Mohamad Aizi
    Saedudin, Rohmat
    Abdullah, Zailani
    Herawan, Tutut
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, 2017, 549 : 608 - 618