Five myths about variable selection

被引:374
作者
Heinze, Georg [1 ]
Dunkler, Daniela [1 ]
机构
[1] Med Univ Vienna, Ctr Med Stat Informat & Intelligent Syst, Sect Clin Biometr, Spitalgasse 23, A-1090 Vienna, Austria
关键词
association; explanatory models; multivariable modeling; prediction; statistical analysis; LIVER-TRANSPLANTATION; SURVIVAL; RECIPIENTS; EVENTS; MODEL;
D O I
10.1111/tri.12895
中图分类号
R61 [外科手术学];
学科分类号
摘要
Multivariable regression models are often used in transplantation research to identify or to confirm baseline variables which have an independent association, causally or only evidenced by statistical correlation, with transplantation outcome. Although sound theory is lacking, variable selection is a popular statistical method which seemingly reduces the complexity of such models. However, in fact, variable selection often complicates analysis as it invalidates common tools of statistical inference such as P-values and confidence intervals. This is a particular problem in transplantation research where sample sizes are often only small to moderate. Furthermore, variable selection requires computer-intensive stability investigations and a particularly cautious interpretation of results. We discuss how five common misconceptions often lead to inappropriate application of variable selection. We emphasize that variable selection and all problems related with it can often be avoided by the use of expert knowledge.
引用
收藏
页码:6 / 10
页数:5
相关论文
共 29 条
  • [1] Portal vein encasement predicts neoadjuvant therapy response in liver transplantation for perihilar cholangiocarcinoma protocol
    Bhat, Mamatha
    Hathcock, Matthew
    Kremers, Walter K.
    Murad, Sarwa Darwish
    Schmit, Grant
    Martenson, James
    Alberts, Steven
    Rosen, Charles B.
    Gores, Gregory J.
    Heimbach, Julie
    [J]. TRANSPLANT INTERNATIONAL, 2015, 28 (12) : 1383 - 1391
  • [2] Breiman L, 1998, ANN STAT, V26, P801
  • [3] Burnham K. P., 2002, A practical information-theoretic approach: model selection and multimodel inference
  • [4] Augmented Backward Elimination: A Pragmatic and Purposeful Way to Develop Statistical Models
    Dunkler, Daniela
    Plischke, Max
    Leffondre, Karen
    Heinze, Georg
    [J]. PLOS ONE, 2014, 9 (11):
  • [5] Cytomegalovirus prevention strategies in seropositive kidney transplant recipients: an insight into current clinical practice
    Fernandez-Ruiz, Mario
    Arias, Manuel
    Campistol, Josep M.
    Navarro, David
    Gomez-Huertas, Ernesto
    Gomez-Marquez, Gonzalo
    Manuel Diaz, Juan
    Hernandez, Domingo
    Bernal-Blanco, Gabriel
    Cofan, Frederic
    Jimeno, Luisa
    Franco-Esteve, Antonio
    Gonzalez, Esther
    Moreso, Francesc J.
    Gomez-Alamillo, Carlos
    Mendiluce, Alicia
    Luna-Huerta, Enrique
    Maria Aguado, Jose
    [J]. TRANSPLANT INTERNATIONAL, 2015, 28 (09) : 1042 - 1054
  • [6] A dirty dozen:: Twelve P-value misconceptions
    Goodman, Steven
    [J]. SEMINARS IN HEMATOLOGY, 2008, 45 (03) : 135 - 140
  • [7] Causal diagrams for epidemiologic research
    Greenland, S
    Pearl, J
    Robins, JM
    [J]. EPIDEMIOLOGY, 1999, 10 (01) : 37 - 48
  • [8] Harrell FE, 2015, SPRINGER SER STAT, DOI 10.1007/978-3-319-19425-7
  • [9] IBM Corp, 2013, IBM STAT WIND
  • [10] Donor/recipient sex mismatch and survival after heart transplantation: only an issue in male recipients? An analysis of the Spanish Heart Transplantation Registry
    Martinez-Selles, Manuel
    Almenar, Luis
    Paniagua-Martin, Maria J.
    Segovia, Javier
    Delgado, Juan F.
    Arizon, Jose M.
    Ayesta, Ana
    Lage, Ernesto
    Brossa, Vicens
    Manito, Nicolas
    Perez-Villa, Felix
    Diaz-Molina, Beatriz
    Rabago, Gregorio
    Blasco-Peiro, Teresa
    De La Fuente Galan, Luis
    Pascual-Figal, Domingo
    Gonzalez-Vilchez, Francisco
    [J]. TRANSPLANT INTERNATIONAL, 2015, 28 (03) : 305 - 313