CMRules: Mining sequential rules common to several sequences

被引:85
作者
Fournier-Viger, Philippe [1 ]
Faghihi, Usef [2 ]
Nkambou, Roger [3 ]
Nguifo, Engelbert Mephu [4 ,5 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
[2] Univ Memphis, Dept Comp Sci, Memphis, TN 38152 USA
[3] Univ Quebec, Dept Comp Sci, Montreal, PQ H3C 3P8, Canada
[4] Univ Blaise Pascal, Clermont Univ, LIMOS, F-63000 Clermont Ferrand, France
[5] LIMOS, CNRS, UMR 6158, F-63173 Aubiere, France
基金
加拿大自然科学与工程研究理事会;
关键词
Sequential rule mining; Sequential pattern mining; Association rule mining; Sequence database; Educational data mining; PATTERNS;
D O I
10.1016/j.knosys.2011.07.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 76
页数:14
相关论文
共 38 条
  • [1] Parallel mining of association rules
    Agrawal, R
    Shafer, JC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) : 962 - 969
  • [2] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [3] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [4] Ayres J., 2002, P ACM SIGKDD INT C K, P429
  • [5] Maintenance of discovered association rules in large databases: Art incremental updating technique
    Cheung, DW
    Han, JW
    Ng, VT
    Wong, CY
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 106 - 114
  • [6] Das G., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P16
  • [7] Deogun J, 2005, LECT NOTES ARTIF INT, V3642, P98, DOI 10.1007/11548706_11
  • [8] Size of random Galois lattices and number of closed frequent itemsets
    Emilion, Richard
    Levy, Gerard
    [J]. DISCRETE APPLIED MATHEMATICS, 2009, 157 (13) : 2945 - 2957
  • [9] Faghihi U., J EXPT THEO IN PRESS
  • [10] Faghihi U., 2010, P IEA AIE 2010 CORD, P438