Extending Classification Algorithms to Case-Control Studies

被引:14
作者
Stanfill, Bryan [1 ]
Reehl, Sarah [1 ]
Bramer, Lisa [1 ]
Nakayasu, Ernesto S. [2 ]
Rich, Stephen S. [3 ]
Metz, Thomas O. [2 ]
Rewers, Marian [4 ]
Webb-Robertson, Bobbie-Jo [2 ]
机构
[1] Pacific Northwest Natl Lab, Comp & Analyt Div, Natl Secur Directorate, Richland, WA 99352 USA
[2] Pacific Northwest Natl Lab, Biol Sci Div, Earth & Biol Sci Directorate, Richland, WA 99352 USA
[3] Univ Virginia, Ctr Publ Hlth Genom, Charlottesville, VA USA
[4] Univ Colorado Denver, Barbara Davis Ctr Childhood Diabet, Aurora, CO USA
关键词
Diabetes; machine learning; support vector machines; biomarker discovery; variable selection; SERUM ALPHA-TOCOPHEROL; ENVIRONMENTAL DETERMINANTS; DIABETES-MELLITUS; INCREASED RISK; TYPE-1; CHILDREN; ACID; AUTOIMMUNITY; METABOLISM; ACTIVATION;
D O I
10.1177/1179597219858954
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Classification is a common technique applied to 'omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated 'omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally. we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.
引用
收藏
页数:12
相关论文
共 53 条
[1]   Boosting for Correlated Binary Classification [J].
Adewale, Adeniyi J. ;
Dinu, Irina ;
Yasui, Yutaka .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (01) :140-153
[2]   Pharmacological inhibition of glucosylceramide synthase enhances insulin sensitivity [J].
Aerts, Johannes M. ;
Ottenhoff, Roelof ;
Powlson, Andrew S. ;
Grefhorst, Aldo ;
van Eijk, Marco ;
Dubbelhuis, Peter F. ;
Aten, Jan ;
Kuipers, Folkert ;
Serlie, Mireille J. ;
Wennekes, Tom ;
Sethi, Jaswinder K. ;
O'Rahilly, Stephen ;
Overkleeft, Hermen S. .
DIABETES, 2007, 56 (05) :1341-1349
[3]  
Ahmed S, 1995, Sankhya: The Indian Journal of Statistics, Series B, P57
[4]   Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer [J].
Anglim, Paul P. ;
Galler, Janice S. ;
Koss, Michael N. ;
Hagen, Jeffrey A. ;
Turla, Sally ;
Campan, Mihaela ;
Weisenberger, Daniel J. ;
Laird, Peter W. ;
Siegmund, Kimberly D. ;
Laird-Offringa, Ite A. .
MOLECULAR CANCER, 2008, 7 (1)
[5]   Defective methionine metabolism in the brain after repeated blast exposures might contribute to increased oxidative stress [J].
Arun, Peethambaran ;
Rittase, William B. ;
Wilder, Donna M. ;
Wang, Ying ;
Gist, Irene D. ;
Long, Joseph B. .
NEUROCHEMISTRY INTERNATIONAL, 2018, 112 :234-238
[6]   Bayesian Variable Selection Methods for Matched Case-Control Studies [J].
Asafu-Adjei, Josephine ;
Tadesse, Mahlet G. ;
Coull, Brent ;
Balasubramanian, Raji ;
Lev, Michael ;
Schwamm, Lee ;
Betensky, Rebecca .
INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01)
[7]   Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm [J].
Avalos, Marta ;
Pouyes, Helene ;
Grandvalet, Yves ;
Orriols, Ludivine ;
Lagarde, Emmanuel .
BMC BIOINFORMATICS, 2015, 16
[8]   Variable importance in matched case-control studies in settings of high dimensional data [J].
Balasubramanian, Raji ;
Houseman, E. Andres ;
Coull, Brent A. ;
Lev, Michael H. ;
Schwamm, Lee H. ;
Betensky, Rebecca A. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2014, 63 (04) :639-655
[9]   Similarities in Serum Acylcarnitine Patterns in Type 1 and Type 2 Diabetes Mellitus and in Metabolic Syndrome [J].
Bene, Judit ;
Marton, Magdolna ;
Mohas, Marton ;
Bagosi, Zoltan ;
Bujtor, Zoltan ;
Oroszlan, Tam ;
Gasztonyi, Beata ;
Wittmann, Istvan ;
Melegh, Bela .
ANNALS OF NUTRITION AND METABOLISM, 2013, 62 (01) :80-85
[10]  
Bi XY, 2017, J CLIN INVEST, V127, P1757, DOI [10.1172/JCI87388, 10.1172/jci87388]