Machine learning approaches to predict lupus disease activity from gene expression data

被引:51
|
作者
Kegerreis, Brian [1 ,2 ]
Catalina, Michelle D. [1 ,2 ]
Bachali, Prathyusha [1 ,2 ]
Geraci, Nicholas S. [1 ,2 ]
Labonte, Adam C. [1 ,2 ]
Zeng, Chen [3 ]
Stearrett, Nathaniel [4 ]
Crandall, Keith A. [4 ]
Lipsky, Peter E. [1 ,2 ]
Grammer, Amrie C. [1 ,2 ]
机构
[1] RILITE Res Inst, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[2] AMPEL BioSolut, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[3] George Washington Univ, Dept Phys, Washington, DC 20052 USA
[4] George Washington Univ, Milken Inst Sch Publ Hlth, Computat Biol Inst, Washington, DC 20052 USA
关键词
LOW-DENSITY GRANULOCYTES; RHEUMATOID-ARTHRITIS; NEUTROPHILS; MACROPHAGES; MODELS; ERYTHEMATOSUS; DAMAGE;
D O I
10.1038/s41598-019-45989-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Machine learning approaches to predict lupus disease activity from gene expression data
    Brian Kegerreis
    Michelle D. Catalina
    Prathyusha Bachali
    Nicholas S. Geraci
    Adam C. Labonte
    Chen Zeng
    Nathaniel Stearrett
    Keith A. Crandall
    Peter E. Lipsky
    Amrie C. Grammer
    Scientific Reports, 9
  • [2] Machine learning approaches for the discovery of gene-gene interactions in disease data
    Upstill-Goddard, Rosanna
    Eccles, Diana
    Fliege, Joerg
    Collins, Andrew
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (02) : 251 - 260
  • [3] Personalized Machine Learning Algorithm to Predict Ulcerative Colitis Disease Activity from Wearable Data
    Saleh, A.
    Abraham, B.
    JOURNAL OF CROHNS & COLITIS, 2024, 18 : I524 - I524
  • [4] Statistical and Machine Learning Approaches to Predict Gene Regulatory Networks From Transcriptome Datasets
    Mochida, Keiichi
    Koda, Satoru
    Inoue, Komaki
    Nishii, Ryuei
    FRONTIERS IN PLANT SCIENCE, 2018, 9
  • [5] A Machine Learning Approach to Predict the Changes of Brain Functional Connectivity in Autism Spectrum Disorder From the Gene Expression Data
    Choudhery, Sanjeevani
    Huang, Chuan
    Wang, Daifeng
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S227 - S228
  • [6] Machine Learning Approaches to Predict Scoliosis
    Liang, Ruixin
    Yip, Joanne
    To, Kai-Tsun Michael
    Fan, Yunli
    ADVANCES IN HUMAN FACTORS AND ERGONOMICS IN HEALTHCARE AND MEDICAL DEVICES (AHFE 2021), 2021, 263 : 116 - 121
  • [7] Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine
    Vadapalli, Sreya
    Abdelhalim, Habiba
    Zeeshan, Saman
    Ahmed, Zeeshan
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [8] A Machine Learning Bioinformatics Method to Predict Biological Activity from Biosynthetic Gene Clusters
    Walker, Allison S.
    Clardy, Jon
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (06) : 2560 - 2571
  • [9] Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data
    Sara A. Yones
    Alva Annett
    Patricia Stoll
    Klev Diamanti
    Linda Holmfeldt
    Carl Fredrik Barrenäs
    Jennifer R. S. Meadows
    Jan Komorowski
    Scientific Reports, 12
  • [10] Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data
    Yones, Sara A.
    Annett, Alva
    Stoll, Patricia
    Diamanti, Klev
    Holmfeldt, Linda
    Barrenas, Carl Fredrik
    Meadows, Jennifer R. S.
    Komorowski, Jan
    SCIENTIFIC REPORTS, 2022, 12 (01)