Machine learning approaches to predict lupus disease activity from gene expression data

被引:51
|
作者
Kegerreis, Brian [1 ,2 ]
Catalina, Michelle D. [1 ,2 ]
Bachali, Prathyusha [1 ,2 ]
Geraci, Nicholas S. [1 ,2 ]
Labonte, Adam C. [1 ,2 ]
Zeng, Chen [3 ]
Stearrett, Nathaniel [4 ]
Crandall, Keith A. [4 ]
Lipsky, Peter E. [1 ,2 ]
Grammer, Amrie C. [1 ,2 ]
机构
[1] RILITE Res Inst, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[2] AMPEL BioSolut, 250 W Main St,Ste 300, Charlottesville, VA 22902 USA
[3] George Washington Univ, Dept Phys, Washington, DC 20052 USA
[4] George Washington Univ, Milken Inst Sch Publ Hlth, Computat Biol Inst, Washington, DC 20052 USA
关键词
LOW-DENSITY GRANULOCYTES; RHEUMATOID-ARTHRITIS; NEUTROPHILS; MACROPHAGES; MODELS; ERYTHEMATOSUS; DAMAGE;
D O I
10.1038/s41598-019-45989-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A Robust Procedure for Machine Learning Algorithms Using Gene Expression Data
    Auwul, Md Rabiul
    Zhang, Chongqi
    Shahjaman, Md
    BIOINTERFACE RESEARCH IN APPLIED CHEMISTRY, 2022, 12 (02): : 2422 - 2439
  • [42] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [43] Machine Learning Clustering for Cancer Analysis Employing Gene Expression Data
    Ospino, Camilo Andres Perez
    Rivera, Jorman Arbey Castro
    Orjuela-Canon, Alvaro D.
    2023 IEEE COLOMBIAN CONFERENCE ON APPLICATIONS OF COMPUTATIONAL INTELLIGENCE, COLCACI, 2023,
  • [44] Gene expression data classification using topology and machine learning models
    Dey, Tamal K.
    Mandal, Sayan
    Mukherjee, Soham
    BMC BIOINFORMATICS, 2022, 22 (SUPPL 10)
  • [45] Using Machine Learning to Predict Protein Structure from Spectral Data
    Kinalwa, Myra
    Doig, Andrew J.
    Blanch, Ewan W.
    XXII INTERNATIONAL CONFERENCE ON RAMAN SPECTROSCOPY, 2010, 1267 : 835 - 836
  • [46] Transfer (machine) learning approaches coupled with target data augmentation to predict the mechanical properties of concrete
    Ford, Emily
    Maneparambil, Kailasnath
    Kumar, Aditya
    Sant, Gaurav
    Neithalath, Narayanan
    MACHINE LEARNING WITH APPLICATIONS, 2022, 8
  • [47] Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches
    Cai, Yaping
    Guan, Kaiyu
    Lobell, David
    Potgieter, Andries B.
    Wang, Shaowen
    Peng, Jian
    Xu, Tianfang
    Asseng, Senthold
    Zhang, Yongguang
    You, Liangzhi
    Peng, Bin
    AGRICULTURAL AND FOREST METEOROLOGY, 2019, 274 : 144 - 159
  • [48] Comparison of Machine Learning-based Approaches to Predict the Conversion to Alzheimer?s Disease from Mild Cognitive Impairment
    Franciotti, Raffaella
    Nardini, Davide
    Russo, Mirella
    Onofrj, Marco
    Sensi, Stefano L.
    NEUROSCIENCE, 2023, 514 : 143 - 152
  • [49] A Review: Machine Learning and Data Mining Approaches for Cardiovascular Disease Diagnosis and Prediction
    Rao G.S.
    Muneeswari G.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2024, 10
  • [50] Shared and Unique Gene Expression in Systemic Lupus Erythematosus Depending on Disease Activity
    Sandrin-Garcia, Paula
    Junta, Cristina Moraes
    Fachin, Ana L.
    Mello, Stephano S.
    Baiao, Ana Maria T.
    Rassi, Diane M.
    Ferreira, Marcia C. T.
    Trevisan, Glauce L.
    Sakamoto-Hojo, Elza T.
    Louzada-Junior, Paulo
    Passos, Geraldo A. S.
    Donadi, Eduardo A.
    CONTEMPORARY CHALLENGES IN AUTOIMMUNITY, 2009, 1173 : 493 - 500