Pareto-optimal patterns in logical analysis of data

被引:58
作者
Hammer, PL
Kogan, A
Simeone, B
Szedmák, S
机构
[1] Rutgers State Univ, RUTCOR, Piscataway, NJ 08854 USA
[2] Rutgers State Univ, Rutgers Business Sch, Newark, NJ 07102 USA
[3] Univ Roma La Sapienza, Dipartimento Stat Probali & Stat Applicate, I-00185 Rome, Italy
关键词
extremal patterns; data mining; machine learning; classification accuracy; Boolean functions;
D O I
10.1016/j.dam.2003.08.013
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Patterns are the key building blocks in the logical analysis of data (LAD). It has been observed in empirical studies and practical applications that some patterns are more "suitable" than others for use in LAD. In this paper, we model various such suitability criteria as partial preorders defined on the set of patterns. We introduce three such preferences, and describe patterns which are Pareto-optimal with respect to any one of them, or to certain combinations of them. We develop polynomial time algorithms for recognizing Pareto-optimal patterns, as well as for transforming an arbitrary pattern to a better Pareto-optimal One with respect to any one of the considered criteria, or their combinations. We obtain analytical representations characterizing some of the sets of Pareto-optimal patterns, and investigate the computational complexity of generating all Pareto-optimal patterns. The empirical evaluation of the relative merits of various types of Pareto-optimality is carried out by comparing the classification accuracy of Pareto-optimal theories on several real life data sets. This evaluation indicates the advantages of "strong patterns", i.e. those patterns which are Pareto-optimal with respect to the "evidential preference" introduced in this paper. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:79 / 102
页数:24
相关论文
共 27 条
  • [1] [Anonymous], P 11 INT JOINT C ART
  • [2] COMPLEXITY OF IDENTIFICATION AND DUALIZATION OF POSITIVE BOOLEAN FUNCTIONS
    BIOCH, JC
    IBARAKI, T
    [J]. INFORMATION AND COMPUTATION, 1995, 123 (01) : 50 - 63
  • [3] Blake C.L., 1998, UCI repository of machine learning databases
  • [4] OCCAM RAZOR
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. INFORMATION PROCESSING LETTERS, 1987, 24 (06) : 377 - 380
  • [5] Logical analysis of numerical data
    Boros, E
    Hammer, PL
    Ibaraki, T
    Kogan, A
    [J]. MATHEMATICAL PROGRAMMING, 1997, 79 (1-3) : 163 - 190
  • [6] DECOMPOSABILITY OF PARTIALLY DEFINED BOOLEAN FUNCTIONS
    BOROS, E
    GURVICH, V
    HAMMER, PL
    IBARAKI, T
    KOGAN, A
    [J]. DISCRETE APPLIED MATHEMATICS, 1995, 62 (1-3) : 51 - 75
  • [7] Logical analysis of binary data with missing bits
    Boros, E
    Ibaraki, T
    Makino, K
    [J]. ARTIFICIAL INTELLIGENCE, 1999, 107 (02) : 219 - 263
  • [8] An implementation of logical analysis of data
    Boros, E
    Hammer, PL
    Ibaraki, T
    Kogan, A
    Mayoraz, E
    Muchnik, I
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (02) : 292 - 306
  • [9] BOROS E, 1998, INFORMATION COMPUTAT, V140, P254
  • [10] Clark P., 1989, Machine Learning, V3, P261, DOI 10.1023/A:1022641700528