Machine learning approaches to predict lupus disease activity from gene expression data

被引:51
|
作者
Kegerreis, Brian [1 ,2 ]
Catalina, Michelle D. [1 ,2 ]
Bachali, Prathyusha [1 ,2 ]
Geraci, Nicholas S. [1 ,2 ]
Labonte, Adam C. [1 ,2 ]
Zeng, Chen [3 ]
Stearrett, Nathaniel [4 ]
Crandall, Keith A. [4 ]
Lipsky, Peter E. [1 ,2 ]
Grammer, Amrie C. [1 ,2 ]
机构
[1] RILITE Res Inst, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[2] AMPEL BioSolut, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[3] George Washington Univ, Dept Phys, Washington, DC 20052 USA
[4] George Washington Univ, Milken Inst Sch Publ Hlth, Computat Biol Inst, Washington, DC 20052 USA
关键词
LOW-DENSITY GRANULOCYTES; RHEUMATOID-ARTHRITIS; NEUTROPHILS; MACROPHAGES; MODELS; ERYTHEMATOSUS; DAMAGE;
D O I
10.1038/s41598-019-45989-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Comparing two machine learning approaches in predicting lupus hospitalization using longitudinal data
    Yijun Zhao
    Dylan Smith
    April Jorge
    Scientific Reports, 12
  • [22] EUnveiling Genetic Disorders: Machine Learning and Deep Learning Approaches in Gene Expression Analysis
    Revathi, K.
    Karthikeyan, V. V.
    Priyanka, S.
    Prakash, S. Jaya
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1315 - 1320
  • [23] MINER: exploratory analysis of gene interaction networks by machine learning from expression data
    Kadupitige, Sidath Randeni
    Leung, Kin Chun
    Sellmeier, Julia
    Sivieng, Jane
    Catchpoole, Daniel R.
    Bain, Michael E.
    Gaeta, Bruno A.
    BMC GENOMICS, 2009, 10
  • [24] MINER: exploratory analysis of gene interaction networks by machine learning from expression data
    Sidath Randeni Kadupitige
    Kin Chun Leung
    Julia Sellmeier
    Jane Sivieng
    Daniel R Catchpoole
    Michael E Bain
    Bruno A Gaëta
    BMC Genomics, 10
  • [25] Gene expression data analysis: a statistical and machine learning perspective
    Chattopadhyay, Amrita
    BIOMETRICS, 2023, 79 (01) : 526 - 528
  • [26] MACHINE LEARNING APPROACHES TO GENE RECOGNITION
    CRAVEN, MW
    SHAVLIK, JW
    IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1994, 9 (02): : 2 - 10
  • [27] Developmental gene regulatory network connections predicted by machine learning from gene expression data alone
    Zhang, Jingyi
    Ibrahim, Farhan
    Najmulski, Emily
    Katholos, George
    Altarawy, Doaa
    Heath, Lenwood S.
    Tulin, Sarah L.
    PLOS ONE, 2021, 16 (12):
  • [28] Machine Learning Approaches to Predict Crop Yield Using Integrated Satellite and Climate Data
    Jhajharia K.
    Mathur P.
    Int. J. Ambient Comput. Intell., 2022, 1
  • [29] Gene expression profiling and machine learning to understand and predict primary graft dysfunction
    Ray, Monika
    Dharmarajan, Sekhar
    Freudenberg, Johannes
    Patterson, G. Alexander
    Zhang, Weixiong
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1076 - +
  • [30] Association of a gene expression profile from whole blood with disease activity in systemic lupus erythaematosus
    Nikpour, M.
    Dempsey, A. A.
    Urowitz, M. B.
    Gladman, D. D.
    Barnes, D. A.
    ANNALS OF THE RHEUMATIC DISEASES, 2008, 67 (08) : 1069 - 1075