A systematic review on machine learning approaches in the diagnosis and prognosis of rare genetic diseases

被引:19
作者
Roman-Naranjo, P. [2 ,3 ,4 ]
Parra-Perez, A. M. [2 ,3 ,4 ]
Lopez-Escamez, J. A. [1 ,2 ,3 ,4 ]
机构
[1] Univ Sydney, Kolling Inst, Sch Med Sci, Fac Med & Hlth,Menieres Dis Neurosci Res Program, Sydney, NSW, Australia
[2] Univ Granada, Dept Surg, Div Otolaryngol, Inst Invest Biosanit,Ibs GRANADA, Granada, Spain
[3] Univ Granada, GENYO Ctr Genom & Oncol Res Pfizer, Dept Genom Med, Otol & Neurotol Grp CTS495,PTS,Junta Andalucia, Granada, Spain
[4] CIBERER, Ctr Invest Biomed Red Enfermedades Raras, Sensorineural Pathol Programme, Madrid, Spain
关键词
Artificial intelligence; Rare diseases; Precision medicine; Rare variants; DNA-sequencing; Genomics; ARTIFICIAL-INTELLIGENCE; CANCER; VARIANTS;
D O I
10.1016/j.jbi.2023.104429
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: The diagnosis of rare genetic diseases is often challenging due to the complexity of the genetic underpinnings of these conditions and the limited availability of diagnostic tools. Machine learning (ML) algorithms have the potential to improve the accuracy and speed of diagnosis by analyzing large amounts of genomic data and identifying complex multiallelic patterns that may be associated with specific diseases. In this systematic review, we aimed to identify the methodological trends and the ML application areas in rare genetic diseases.Methods: We performed a systematic review of the literature following the PRISMA guidelines to search studies that used ML approaches to enhance the diagnosis of rare genetic diseases. Studies that used DNA-based sequencing data and a variety of ML algorithms were included, summarized, and analyzed using bibliometric methods, visualization tools, and a feature co-occurrence analysis.Findings: Our search identified 22 studies that met the inclusion criteria. We found that exome sequencing was the most frequently used sequencing technology (59%), and rare neoplastic diseases were the most prevalent disease scenario (59%). In rare neoplasms, the most frequent applications of ML models were the differential diagnosis or stratification of patients (38.5%) and the identification of somatic mutations (30.8%). In other rare diseases, the most frequent goals were the prioritization of rare variants or genes (55.5%) and the identification of biallelic or digenic inheritance (33.3%). The most employed method was the random forest algorithm (54.5%). In addition, the features of the datasets needed for training these algorithms were distinctive depending on the goal pursued, including the mutational load in each gene for the differential diagnosis of patients, or the combination of genotype features and sequence-derived features (such as GC-content) for the identification of somatic mutations.Conclusions: ML algorithms based on sequencing data are mainly used for the diagnosis of rare neoplastic diseases, with random forest being the most common approach. We identified key features in the datasets used for training these ML models according to the objective pursued. These features can support the development of future ML models in the diagnosis of rare genetic diseases.
引用
收藏
页数:8
相关论文
共 64 条
[21]   Medical implications of technical accuracy in genome sequencing [J].
Goldfeder, Rachel L. ;
Priest, James R. ;
Zook, Justin M. ;
Grove, Megan E. ;
Waggott, Daryl ;
Wheeler, Matthew T. ;
Salit, Marc ;
Ashley, Euan A. .
GENOME MEDICINE, 2016, 8
[22]   Random Forests for Genetic Association Studies [J].
Goldstein, Benjamin A. ;
Polley, Eric C. ;
Briggs, Farren B. S. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[23]   Integration of Genomic and Clinical Retrospective Data to Predict Endometrioid Endometrial Cancer Recurrence [J].
Gonzalez-Bosquet, Jesus ;
Gabrilovich, Sofia ;
McDonald, Megan E. ;
Smith, Brian J. ;
Leslie, Kimberly K. ;
Bender, David D. ;
Goodheart, Michael J. ;
Devor, Eric .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (24)
[24]   Progress in Methods for Copy Number Variation Profiling [J].
Gordeeva, Veronika ;
Sharova, Elena ;
Arapidi, Georgij .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (04)
[25]   A review on convolutional neural network based deep learning methods in gene expression data for disease diagnosis [J].
Gunavathi, C. ;
Sivasubramanian, K. ;
Keerthika, P. ;
Paramasivam, C. .
MATERIALS TODAY-PROCEEDINGS, 2021, 45 :2282-2285
[26]   ISOWN: accurate somatic mutation identification in the absence of normal tissue controls [J].
Kalatskaya, Irina ;
Trinh, Quang M. ;
Spears, Melanie ;
McPherson, John D. ;
Bartlett, John M. S. ;
Stein, Lincoln .
GENOME MEDICINE, 2017, 9
[27]   Exome first approach to reduce diagnostic costs and time - retrospective analysis of 111 individuals with rare neurodevelopmental disorders [J].
Klau, Julia ;
Abou Jamra, Rami ;
Radtke, Maximilian ;
Oppermann, Henry ;
Lemke, Johannes R. ;
Beblo, Skadi ;
Popp, Bernt .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (01) :117-125
[28]   NR5A1 c.991-1G > C splice-site variant causes familial 46,XY partial gonadal dysgenesis with incomplete penetrance [J].
Laan, Maris ;
Kasak, Laura ;
Timinskas, Kestutis ;
Grigorova, Marina ;
Venclovas, Ceslovas ;
Renaux, Alexandre ;
Lenaerts, Tom ;
Punab, Margus .
CLINICAL ENDOCRINOLOGY, 2021, 94 (04) :656-666
[29]   Deep learning for rare disease: A scoping review [J].
Lee, Junghwan ;
Liu, Cong ;
Kim, Junyoung ;
Chen, Zhehuan ;
Sun, Yingcheng ;
Rogers, James R. ;
Chung, Wendy K. ;
Weng, Chunhua .
JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 135
[30]   UpSet: Visualization of Intersecting Sets [J].
Lex, Alexander ;
Gehlenborg, Nils ;
Strobelt, Hendrik ;
Vuillemot, Romain ;
Pfister, Hanspeter .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) :1983-1992