Extending Classification Algorithms to Case-Control Studies

被引:14
作者
Stanfill, Bryan [1 ]
Reehl, Sarah [1 ]
Bramer, Lisa [1 ]
Nakayasu, Ernesto S. [2 ]
Rich, Stephen S. [3 ]
Metz, Thomas O. [2 ]
Rewers, Marian [4 ]
Webb-Robertson, Bobbie-Jo [2 ]
机构
[1] Pacific Northwest Natl Lab, Comp & Analyt Div, Natl Secur Directorate, Richland, WA 99352 USA
[2] Pacific Northwest Natl Lab, Biol Sci Div, Earth & Biol Sci Directorate, Richland, WA 99352 USA
[3] Univ Virginia, Ctr Publ Hlth Genom, Charlottesville, VA USA
[4] Univ Colorado Denver, Barbara Davis Ctr Childhood Diabet, Aurora, CO USA
关键词
Diabetes; machine learning; support vector machines; biomarker discovery; variable selection; SERUM ALPHA-TOCOPHEROL; ENVIRONMENTAL DETERMINANTS; DIABETES-MELLITUS; INCREASED RISK; TYPE-1; CHILDREN; ACID; AUTOIMMUNITY; METABOLISM; ACTIVATION;
D O I
10.1177/1179597219858954
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Classification is a common technique applied to 'omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated 'omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally. we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.
引用
收藏
页数:12
相关论文
共 53 条
[41]   Human enterovirus infections in children at increased risk for type 1 diabetes: the Babydiet study [J].
Simonen-Tikka, M. -L. ;
Pflueger, M. ;
Klemola, P. ;
Savolainen-Kopra, C. ;
Smura, T. ;
Hummel, S. ;
Kaijalainen, S. ;
Nuutila, K. ;
Natri, O. ;
Roivainen, M. ;
Ziegler, A. -G. .
DIABETOLOGIA, 2011, 54 (12) :2995-3002
[42]   MissForest-non-parametric missing value imputation for mixed-type data [J].
Stekhoven, Daniel J. ;
Buehlmann, Peter .
BIOINFORMATICS, 2012, 28 (01) :112-118
[43]   Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes [J].
Todd, John A. ;
Walker, Neil M. ;
Cooper, Jason D. ;
Smyth, Deborah J. ;
Downes, Kate ;
Plagnol, Vincent ;
Bailey, Rebecca ;
Nejentsev, Sergey ;
Field, Sarah F. ;
Payne, Felicity ;
Lowe, Christopher E. ;
Szeszko, Jeffrey S. ;
Hafler, Jason P. ;
Zeitels, Lauren ;
Yang, Jennie H. M. ;
Vella, Adrian ;
Nutland, Sarah ;
Stevens, Helen E. ;
Schuilenburg, Helen ;
Coleman, Gillian ;
Maisuria, Meeta ;
Meadows, William ;
Smink, Luc J. ;
Healy, Barry ;
Burren, Oliver S. ;
Lam, Alex A. C. ;
Ovington, Nigel R. ;
Allen, James ;
Adlem, Ellen ;
Leung, Hin-Tak ;
Wallace, Chris ;
Howson, Joanna M. M. ;
Guja, Cristian ;
Ionescu-Tirgoviste, Constantin ;
Simmonds, Matthew J. ;
Heward, Joanne M. ;
Gough, Stephen C. L. ;
Dunger, David B. ;
Wicker, Linda S. ;
Clayton, David G. .
NATURE GENETICS, 2007, 39 (07) :857-864
[44]   Identification of a panel of sensitive and specific DNA methylation markers for lung adenocarcinoma [J].
Tsou, Jeffrey A. ;
Galler, Janice S. ;
Siegmund, Kimberly D. ;
Laird, Peter W. ;
Turla, Sally ;
Cozen, Wendy ;
Hagen, Jeffrey A. ;
Koss, Michael N. ;
Laird-Offringa, Ite A. .
MOLECULAR CANCER, 2007, 6 (1)
[45]   Serum α- and γ-tocopherol concentrations and risk of advanced beta cell autoimmunity in children with HLA-conferred susceptibility to type 1 diabetes mellitus [J].
Uusitalo, L. ;
Nevalainen, J. ;
Niinisto, S. ;
Alfthan, G. ;
Sundvall, J. ;
Korhonen, T. ;
Kenward, M. G. ;
Oja, H. ;
Veijola, R. ;
Simell, O. ;
Ilonen, J. ;
Knip, M. ;
Virtanen, S. M. .
DIABETOLOGIA, 2008, 51 (05) :773-780
[46]  
Uusitalo L, 2005, J PEDIATR ENDOCR MET, V18, P1409
[47]  
Venables WN, 2002, Modern Applied Statistics with S, V4th
[48]   Synthesis and characterisation of galactosyl glycerol by β-galactosidase catalysed reverse hydrolysis of galactose and glycerol [J].
Wei, Wei ;
Qi, Danping ;
Zhao, Hai-zhen ;
Lu, Zhao-xin ;
Lv, Fengxia ;
Bie, Xiaomei .
FOOD CHEMISTRY, 2013, 141 (03) :3085-3092
[49]   ranger: A Fast Implementation of Random Forests for High Dimensional Data in C plus plus and R [J].
Wright, Marvin N. ;
Ziegler, Andreas .
JOURNAL OF STATISTICAL SOFTWARE, 2017, 77 (01) :1-17
[50]   New potential biomarkers in the diagnosis of esophageal squamous cell carcinoma [J].
Xu, Shu-Yong ;
Liu, Zan ;
Ma, Wen-Jing ;
Sheyhidin, Ilyar ;
Zheng, Shu-Tao ;
Lu, Xiao-Mei .
BIOMARKERS, 2009, 14 (05) :340-346