Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application

被引:168
作者
Vasquez, Monica M. [1 ,2 ]
Hu, Chengcheng [1 ]
Roe, Denise J. [1 ]
Chen, Zhao [1 ]
Halonen, Marilyn [2 ]
Guerra, Stefano [2 ,3 ]
机构
[1] Univ Arizona, Mel & Enid Zuckerman Coll Publ Hlth, 1295 North Martin Ave,POB 245211, Tucson, AZ 85724 USA
[2] Univ Arizona, Asthma & Airway Dis Res Ctr, 1501 North Campbell Ave,POB 245030, Tucson, AZ 85724 USA
[3] Univ Pompeu Fabra, ISGlobal CREAL Ctr, Barcelona, Spain
来源
BMC MEDICAL RESEARCH METHODOLOGY | 2016年 / 16卷
关键词
LASSO; Biomarkers; High-Dimensional; Obesity; Overweight; HORMONE-BINDING-GLOBULIN; SURFACTANT PROTEIN-D; METABOLIC SYNDROME; CARDIOVASCULAR RISK; VARIABLE SELECTION; ASSOCIATION; DISEASE; WOMEN; MEN; REGULARIZATION;
D O I
10.1186/s12874-016-0254-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios. Methods: A simulation study was performed to compare the LASSO, Adaptive LASSO, Elastic Net, Iterated LASSO, Bootstrap-Enhanced LASSO, and Weighted Fusion for the binary logistic regression model. The simulation study was designed to reflect the data structure of the population-based Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD), specifically the sample size (N = 1000 for total population, 500 for sub-analyses), correlation of biomarkers (0.20, 0.50, 0.80), prevalence of overweight (40%) and obese (12%) outcomes, and the association of outcomes with standardized serum biomarker concentrations (log-odds ratio = 0.05-1.75). Each LASSO-type method was then applied to the TESAOD data of 306 overweight, 66 obese, and 463 normal-weight subjects with a panel of 86 serum biomarkers. Results: Based on the simulation study, no method had an overall superior performance. The Weighted Fusion correctly identified more true signals, but incorrectly included more noise variables. The LASSO and Elastic Net correctly identified many true signals and excluded more noise variables. In the application study, biomarkers of overweight and obesity selected by all methods were Adiponectin, Apolipoprotein H, Calcitonin, CD14, Complement 3, C-reactive protein, Ferritin, Growth Hormone, Immunoglobulin M, Interleukin-18, Leptin, Monocyte Chemotactic Protein-1, Myoglobin, Sex Hormone Binding Globulin, Surfactant Protein D, and YKL-40. Conclusions: For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective. The LASSO-type methods identified biomarkers that have known associations with obesity and obesity related conditions.
引用
收藏
页码:1 / 19
页数:19
相关论文
共 48 条
  • [1] Variable selection on large case-crossover data: application to a registry-based study of prescription drugs and road traffic crashes
    Avalos, Marta
    Orriols, Ludivine
    Pouyes, Helene
    Grandvalet, Yves
    Thiessard, Frantz
    Lagarde, Emmanuel
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2014, 23 (02) : 140 - 151
  • [2] Bach F, 20090901 ARXIV E PRI
  • [3] Bach F.R., 2008, P 25 INT C MACH LEAR, P33, DOI DOI 10.1145/1390156.1390161
  • [4] Chen, 2014, PAFFFCSD R PACKAGE V
  • [5] Shrinkage and model selection with correlated variables via weighted fusion
    Daye, Z. John
    Jeng, X. Jessie
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (04) : 1284 - 1298
  • [6] De Pergola G, 2013, ENDOCR METAB IMMUNE, V13, P301
  • [7] Weight loss reduces interleukin-18 levels in obese women
    Esposito, K
    Pontillo, A
    Ciotola, M
    Di Palo, C
    Grella, E
    Nicoletti, G
    Giugliano, D
    [J]. JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM, 2002, 87 (08) : 3864 - 3866
  • [8] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [9] Fan JQ, 2009, J MACH LEARN RES, V10, P2013
  • [10] HIGH-DIMENSIONAL CLASSIFICATION USING FEATURES ANNEALED INDEPENDENCE RULES
    Fan, Jianqing
    Fan, Yingying
    [J]. ANNALS OF STATISTICS, 2008, 36 (06) : 2605 - 2637