An unsupervised machine learning method for discovering patient clusters based on genetic signatures

被引:70
|
作者
Lopez, Christian [1 ]
Tucker, Scott [2 ,3 ]
Salameh, Tarik [2 ]
Tucker, Conrad [1 ,4 ,5 ]
机构
[1] Penn State Univ, Ind & Mfg Engn, University Pk, PA 16802 USA
[2] Penn State Univ, Hershey Coll Med, Hershey, PA 17033 USA
[3] Penn State Univ, Engn Sci & Mech, University Pk, PA 16802 USA
[4] Penn State Univ, Engn Design Technol & Profess Programs, University Pk, PA 16802 USA
[5] Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Unsupervised machine learning; Clustering analysis; Genomic similarity; Multiple sclerosis; EXPRESSION DATA; CLASSIFICATION; OPTIMIZATION; MEDICINE; CANCER; MODEL; RISK; TOOL;
D O I
10.1016/j.jbi.2018.07.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Introduction: Many chronic disorders have genomic etiology, disease progression, clinical presentation, and response to treatment that vary on a patient-to-patient basis. Such variability creates a need to identify characteristics within patient populations that have clinically relevant predictive value in order to advance personalized medicine. Unsupervised machine learning methods are suitable to address this type of problem, in which no a priori class label information is available to guide this search. However, it is challenging for existing methods to identify cluster memberships that are not just a result of natural sampling variation. Moreover, most of the current methods require researchers to provide specific input parameters a priori. Method: This work presents an unsupervised machine learning method to cluster patients based on their genomic makeup without providing input parameters a priori. The method implements internal validity metrics to algorithmically identify the number of clusters, as well as statistical analyses to test for the significance of the results. Furthermore, the method takes advantage of the high degree of linkage disequilibrium between single nucleotide polymorphisms. Finally, a gene pathway analysis is performed to identify potential relationships between the clusters in the context of known biological knowledge. Datasets and results: The method is tested with a cluster validation and a genomic dataset previously used in the literature. Benchmark results indicate that the proposed method provides the greatest performance out of the methods tested. Furthermore, the method is implemented on a sample genome-wide study dataset of 191 multiple sclerosis patients. The results indicate that the method was able to identify genetically distinct patient clusters without the need to select parameters a priori. Additionally, variants identified as significantly different between clusters are shown to be enriched for protein-protein interactions, especially in immune processes and cell adhesion pathways, via Gene Ontology term analysis. Conclusion: Once links are drawn between clusters and clinically relevant outcomes, Immunochip data can be used to classify high-risk and newly diagnosed chronic disease patients into known clusters for predictive value. Further investigation can extend beyond pathway analysis to evaluate these clusters for clinical significance of genetically related characteristics such as age of onset, disease course, heritability, and response to treatment.
引用
收藏
页码:30 / 39
页数:10
相关论文
共 50 条
  • [1] Identifying Patterns of Breast Cancer Genetic Signatures using Unsupervised Machine Learning
    Hamoudi, Rifat
    Bettayeb, Meriem
    Alsaafin, Areej
    Hachim, Mahmood
    Nassir, Qassim
    Nassif, Ali Bou
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS & TECHNIQUES (IST 2019), 2019,
  • [2] Unsupervised Machine Learning for the Identification of Preflare Spectroscopic Signatures
    Woods, Magnus M.
    Sainz Dalda, Alberto
    De Pontieu, Bart
    ASTROPHYSICAL JOURNAL, 2021, 922 (02):
  • [3] Ensemble-based unsupervised machine learning method for membership determination of open clusters using Mahalanobis distance
    Deb, Sukanta
    Baruah, Amiya
    Kumar, Subhash
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 515 (04) : 4685 - 4701
  • [4] Discovering user communities on the Internet using unsupervised machine learning techniques
    Paliouras, G
    Papatheodorou, C
    Karkaletsis, V
    Spyropoulos, CD
    INTERACTING WITH COMPUTERS, 2002, 14 (06) : 761 - 791
  • [5] Discovering low-energy atomic clusters with machine learning
    Pereira, Francisco
    Lourenco, Nuno
    Jesus, Wanderson F.
    Prudente, Frederico V.
    Marques, Jorge M. C.
    EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2019, 49 : 41 - 41
  • [6] Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning
    Way, Gregory P.
    Greene, Casey S.
    ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 2, 2019, 2019, 2 : 1 - 17
  • [7] Unsupervised machine learning methods reveal metabolomic based clusters in breast cancer patients
    Gal, Jocelyn
    Bailleux, Caroline
    Chardin, David
    Pourcher, Thierry
    Jing, Lun
    Guignonis, Jean-Marie
    Ferrero, Jean-Marc
    Schiappa, Renaud
    Chamorey, Emmanuel
    Humbert, Olivier
    CANCER RESEARCH, 2019, 79 (13)
  • [8] Flow field characterization and evaluation method based on unsupervised machine learning
    Li, Shanshan
    Feng, Qihong
    Zhang, Xianmin
    Liu, Haicheng
    Liu, Lijie
    Huang, Yingsong
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 215
  • [9] Channel head extraction based on fuzzy unsupervised machine learning method
    Wu, Jian
    Liu, Haixing
    Wang, Zhe
    Ye, Lei
    Li, Min
    Peng, Yong
    Zhang, Chi
    Zhou, Huicheng
    GEOMORPHOLOGY, 2021, 391
  • [10] Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records
    Wang, Yanshan
    Zhao, Yiqing
    Therneau, Terry M.
    Atkinson, Elizabeth J.
    Tafti, Ahmad P.
    Zhang, Nan
    Amin, Shreyasee
    Limper, Andrew H.
    Khosla, Sundeep
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 102