Machine learning in rare disease

被引:26
作者
Banerjee, Jineta [1 ]
Taroni, Jaclyn N. [2 ]
Allaway, Robert J. [1 ]
Prasad, Deepashree Venkatesh [2 ]
Guinney, Justin [1 ]
Greene, Casey [3 ]
机构
[1] Sage Bionetworks, Seattle, WA USA
[2] Alexs Lemonade Stand Fdn, Childhood Canc Data Lab, Philadelphia, PA USA
[3] Univ Colorado, Sch Med, Dept Biomed Informat, Aurora, CO 80045 USA
关键词
GENE; PHENOTYPES; SELECTION; FACE;
D O I
10.1038/s41592-023-01886-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research. This Perspective discusses how machine learning can help in studying rare diseases using various emerging approaches.
引用
收藏
页码:803 / 814
页数:12
相关论文
共 86 条
  • [51] svaseq: removing batch effects and other unwanted noise from sequencing data
    Leek, Jeffrey T.
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (21) : e161
  • [52] Unsupervised Analysis of Transcriptomic Profiles Reveals Six Glioma Subtypes
    Li, Aiguo
    Walling, Jennifer
    Ahn, Susie
    Kotliarov, Yuri
    Su, Qin
    Quezado, Martha
    Oberholtzer, J. Carl
    Park, John
    Zenklusen, Jean C.
    Fine, Howard A.
    [J]. CANCER RESEARCH, 2009, 69 (05) : 2091 - 2099
  • [53] LI X, 2019, BMC MED INFORM DECIS, V19
  • [54] Explainable machine-learning predictions for the prevention of hypoxaemia during surgery
    Lundberg, Scott M.
    Nair, Bala
    Vavilala, Monica S.
    Horibe, Mayumi
    Eisses, Michael J.
    Adams, Trevor
    Liston, David E.
    Low, Daniel King-Wai
    Newman, Shu-Fang
    Kim, Jerry
    Lee, Su-In
    [J]. NATURE BIOMEDICAL ENGINEERING, 2018, 2 (10): : 749 - 760
  • [55] Pathway-level information extractor (PLIER) for gene expression data
    Mao, Weiguang
    Zaslavsky, Elena
    Hartmann, Boris M.
    Sealfon, Stuart C.
    Chikina, Maria
    [J]. NATURE METHODS, 2019, 16 (07) : 607 - +
  • [56] WikiPathways: connecting communities
    Martens, Marvin
    Ammar, Ammar
    Riutta, Anders
    Waagmeester, Andra
    Slenter, Denise N.
    Hanspers, Kristina
    Miller, Ryan A.
    Digles, Daniela
    Lopes, Elisson N.
    Ehrhart, Friederike
    Dupuis, Lauren J.
    Winckers, Laurent A.
    Coort, Susan L.
    Willighagen, Egon L.
    Evelo, Chris T.
    Pico, Alexander R.
    Kutmon, Martina
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D613 - D621
  • [57] McInnes L, 2020, Arxiv, DOI [arXiv:1802.03426, DOI 10.21105/JOSS.00861, 10.21105/joss.00861]
  • [58] Original Learning Drug-Disease-Target Embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses
    Moon, Changsung
    Jin, Chunming
    Dong, Xialan
    Abrar, Saad
    Zheng, Weifan
    Chirkova, Rada Y.
    Tropsha, Alexander
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 119
  • [59] More A., 2016, PREPRINT
  • [60] The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species
    Mungall, Christopher J.
    McMurry, Julie A.
    Koehler, Sebastian
    Balhoff, James P.
    Borromeo, Charles
    Brush, Matthew
    Carbon, Seth
    Conlin, Tom
    Dunn, Nathan
    Engelstad, Mark
    Foster, Erin
    Gourdine, J. P.
    Jacobsen, Julius O. B.
    Keith, Dan
    Laraway, Bryan
    Lewis, Suzanna E.
    NguyenXuan, Jeremy
    Shefchek, Kent
    Vasilevsky, Nicole
    Yuan, Zhou
    Washington, Nicole
    Hochheiser, Harry
    Groza, Tudor
    Smedley, Damian
    Robinson, Peter N.
    Haendel, Melissa A.
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D712 - D722