Machine learning in rare disease

被引:26
作者
Banerjee, Jineta [1 ]
Taroni, Jaclyn N. [2 ]
Allaway, Robert J. [1 ]
Prasad, Deepashree Venkatesh [2 ]
Guinney, Justin [1 ]
Greene, Casey [3 ]
机构
[1] Sage Bionetworks, Seattle, WA USA
[2] Alexs Lemonade Stand Fdn, Childhood Canc Data Lab, Philadelphia, PA USA
[3] Univ Colorado, Sch Med, Dept Biomed Informat, Aurora, CO 80045 USA
关键词
GENE; PHENOTYPES; SELECTION; FACE;
D O I
10.1038/s41592-023-01886-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research. This Perspective discusses how machine learning can help in studying rare diseases using various emerging approaches.
引用
收藏
页码:803 / 814
页数:12
相关论文
共 86 条
  • [1] Next-Generation Sequencing to Diagnose Suspected Genetic Disorders
    Adams, David R.
    Eng, Christine M.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2018, 379 (14) : 1353 - 1362
  • [2] Learning statistical models of phenotypes using noisy labeled training data
    Agarwal, Vibhu
    Podchiyska, Tanya
    Banda, Juan M.
    Goel, Veena
    Leung, Tiffany I.
    Minty, Evan P.
    Sweeney, Timothy E.
    Gyang, Elsie
    Shah, Nigam H.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) : 1166 - 1173
  • [3] The curse(s) of dimensionality
    Altman, Naomi
    Krzywinski, Martin
    [J]. NATURE METHODS, 2018, 15 (06) : 399 - 400
  • [4] A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept Detection
    Ambert, Kyle H.
    Cohen, Aaron M.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (04) : 590 - 595
  • [5] [Anonymous], 1983, ORPHAN DRUG ACT
  • [6] Sensitive detection of rare disease-associated cell subsets via representation learning
    Arvaniti, Eirini
    Claassen, Manfred
    [J]. NATURE COMMUNICATIONS, 2017, 8
  • [7] Integrative Analysis Identifies Candidate Tumor Microenvironment and Intracellular Signaling Pathways that Define Tumor Heterogeneity in NF1
    Banerjee, Jineta
    Allaway, Robert J.
    Taroni, Jaclyn N.
    Baker, Aaron
    Zhang, Xiaochun
    Moon, Chang In
    Pratilas, Christine A.
    Blakeley, Jaishri O.
    Guinney, Justin
    Hirbe, Angela
    Greene, Casey S.
    Gosline, Sara J. C.
    [J]. GENES, 2020, 11 (02)
  • [8] Blitzer J., 2006, P 2006 C EMP METH NA, P120
  • [9] Rare-disease genetics in the era of next-generation sequencing: discovery to translation
    Boycott, Kym M.
    Vanstone, Megan R.
    Bulman, Dennis E.
    MacKenzie, Alex E.
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (10) : 681 - 691
  • [10] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32