Multi-pattern generation framework for logical analysis of data

被引:10
作者
Chou, Chun-An [1 ]
Bonates, Tiberius O. [4 ]
Lee, Chungmok [5 ]
Chaovalitwongse, Wanpracha Art [2 ,3 ]
机构
[1] SUNY Binghamton, Dept Syst Sci & Ind Engn, Vestal, NY 13850 USA
[2] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
[3] Univ Washington, Dept Radiol, Seattle, WA 98195 USA
[4] Univ Fed Ceara, Dept Stat & Appl Math, Fortaleza, CE, Brazil
[5] Hankuk Univ Foreign Studies, Dept Ind & Management Engn, Yongin 449791, Gyeonggi Do, South Korea
基金
美国国家科学基金会;
关键词
Logical analysis of data; Combinatorial optimization; Column generation; Pattern mining; Classification;
D O I
10.1007/s10479-015-1867-8
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.
引用
收藏
页码:329 / 349
页数:21
相关论文
共 31 条
[1]   Spanned patterns for the logical analysis of data [J].
Alexe, G ;
Hammer, PL .
DISCRETE APPLIED MATHEMATICS, 2006, 154 (07) :1039-1049
[2]   Logical analysis of diffuse large B-cell lymphomas [J].
Alexe, G ;
Alexe, S ;
Axelrod, DE ;
Hammer, PL ;
Weissmann, D .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 34 (03) :235-267
[3]   Ovarian cancer detection by logical analysis of proteomic data [J].
Alexe, G ;
Alexe, S ;
Liotta, LA ;
Petricoin, E ;
Reiss, M ;
Hammer, PL .
PROTEOMICS, 2004, 4 (03) :766-783
[4]   Comprehensive vs. comprehensible classifiers in logical analysis of data [J].
Alexe, Gabriela ;
Alexe, Sorin ;
Hammer, Peter L. ;
Kogan, Alexander .
DISCRETE APPLIED MATHEMATICS, 2008, 156 (06) :870-882
[5]   Logical analysis of data - the vision of Peter L. Hammer [J].
Alexe, Gabriela ;
Alexe, Sorin ;
Bonates, Tiberius O. ;
Kogan, Alexander .
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2007, 49 (1-4) :265-312
[6]   Accelerated algorithm for pattern detection in logical analysis of data [J].
Alexe, S ;
Hammer, PL .
DISCRETE APPLIED MATHEMATICS, 2006, 154 (07) :1050-1063
[7]   Coronary risk prediction by logical analysis of data [J].
Alexe, S ;
Blackstone, E ;
Hammer, PL ;
Ishwaran, H ;
Lauer, MS ;
Snader, CEP .
ANNALS OF OPERATIONS RESEARCH, 2003, 119 (1-4) :15-42
[8]  
Alexe S, 2007, SPRINGER SER OPTIM A, V7, P3
[9]  
Bache K., 2013, UCI Machine Learning Repository
[10]   Branch-and-price: Column generation for solving huge integer programs [J].
Barnhart, C ;
Johnson, EL ;
Nemhauser, GL ;
Savelsbergh, MWP ;
Vance, PH .
OPERATIONS RESEARCH, 1998, 46 (03) :316-329