Information criteria for model selection

被引:34
作者
Zhang, Jiawei [1 ]
Yang, Yuhong [1 ]
Ding, Jie [1 ,2 ]
机构
[1] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN USA
[2] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
Akaike information criterion; Bayesian information criterion; information criteria; model selection; variable selection; DIMENSIONAL LINEAR-REGRESSION; VARIABLE SELECTION; CROSS-VALIDATION; MINIMAX RATES; COMPLEXITY; PREDICTION; ORDER; PRINCIPLE; PENALTIES; PROPERTY;
D O I
10.1002/wics.1607
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The rapid development of modeling techniques has brought many opportunities for data-driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection
引用
收藏
页数:27
相关论文
共 96 条
[1]   Model selection for ecologists: the worldviews of AIC and BIC [J].
Aho, Ken ;
Derryberry, DeWayne ;
Peterson, Teri .
ECOLOGY, 2014, 95 (03) :631-636
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]   STATISTICAL PREDICTOR IDENTIFICATION [J].
AKAIKE, H .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1970, 22 (02) :203-&
[4]  
Akaike H., 1998, Selected papers of hirotugu akaike, P199, DOI [DOI 10.1007/978-1-4612-1694-0_15, DOI 10.1007/978-1-4612-1694-015, 10.1007/978-1-4612-1694-0_15]
[5]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[6]   MEAN SQUARE ERROR OF PREDICTION AS A CRITERION FOR SELECTING VARIABLES [J].
ALLEN, DM .
TECHNOMETRICS, 1971, 13 (03) :469-&
[7]  
Anderson DavidRaymond., 2002, MODEL SELECTION MULT, DOI [10.1007/978-1-4757- 2917-7, DOI 10.1007/978-1-4757-2917-7, DOI 10.1016/J.ECOLMODEL.2003.11.004]
[8]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[9]  
Arlot S, 2009, J MACH LEARN RES, V10, P245
[10]   Risk bounds for model selection via penalization [J].
Barron, A ;
Birgé, L ;
Massart, P .
PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) :301-413