Accurate Intelligible Models with Pairwise Interactions

被引:351
作者
Lou, Yin [1 ]
Caruana, Rich [2 ]
Gehrke, Johannes [1 ]
Hooker, Giles [3 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] Microsoft Corp, Microsoft Res, Redmond, WA 98052 USA
[3] Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USA
来源
19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13) | 2013年
关键词
classification; regression; interaction detection;
D O I
10.1145/2487575.2487579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions. In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2M-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model. In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.
引用
收藏
页码:623 / 631
页数:9
相关论文
共 17 条
[1]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[2]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[3]   PREDICTIVE LEARNING VIA RULE ENSEMBLES [J].
Frieman, Jerome H. ;
Popescu, Bogdan E. .
ANNALS OF APPLIED STATISTICS, 2008, 2 (03) :916-954
[4]  
Guyon I., 2003, J MACH LEARN RES, V3, P1157
[5]  
Hastie T., 1986, STAT SCI, P297, DOI DOI 10.1214/SS/1177013604
[6]  
Hooker G., 2004, P 10 ACM SIGKDD INT
[7]   Generalized functional ANOVA diagnostics for high-dimensional functions of dependent variables [J].
Hooker, Giles .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) :709-732
[8]  
LI P, 2007, Advances in neural information processing systems, V20, P897
[9]  
Loh WY, 2002, STAT SINICA, V12, P361
[10]  
Lou Y., 2012, KDD