Aberrant splicing prediction across human tissues

被引:0
作者
Nils Wagner
Muhammed H. Çelik
Florian R. Hölzlwimmer
Christian Mertes
Holger Prokisch
Vicente A. Yépez
Julien Gagneur
机构
[1] Technical University of Munich,School of Computation, Information and Technology
[2] Helmholtz Association – Munich School for Data Science (MUDS),Munich Data Science Institute
[3] Center for Complex Biological Systems,Institute of Human Genetics, School of Medicine
[4] University of California,undefined
[5] Irvine,undefined
[6] Technical University of Munich,undefined
[7] Technical University of Munich,undefined
[8] Computational Health Center,undefined
[9] Helmholtz Center Munich,undefined
来源
Nature Genetics | 2023年 / 55卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.
引用
收藏
页码:861 / 870
页数:9
相关论文
共 107 条
  • [1] Zappala Z(2016)Non-coding loss-of-function variation in human genomes Hum. Hered. 81 78-87
  • [2] Montgomery SB(2019)Predicting splicing from primary sequence with deep learning Cell 176 535-548.e24
  • [3] Jaganathan K(2019)MMSplice: modular modeling improves the predictions of genetic variant effects on splicing Genome Biol. 20 377-394
  • [4] Cheng J(2004)Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals J. Comput. Biol. 11 698-711
  • [5] Yeo G(2015)Learning the sequence determinants of alternative splicing from millions of random sequences Cell 163 1254806-4615
  • [6] Burge CB(2015)The human splicing code reveals new insights into the genetic determinants of disease Science 347 2205-443
  • [7] Rosenberg AB(2021)CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores Genome Med. 13 4609-D773
  • [8] Patwardhan RP(2021)Interpretable prioritization of splice variants in diagnostic next-generation sequencing Am. J. Hum. Genet. 108 eaaz5900-819
  • [9] Shendure J(2021)MTSplice predicts effects of genetic variants on tissue-specific splicing Genome Biol. 22 434-2838
  • [10] Seelig G(2021)Detection of aberrant splicing events in RNA-seq data using FRASER Nat. Commun. 12 D766-21