In silico prediction of splice-altering single nucleotide variants in the human genome

被引:392
作者
Jian, Xueqiu [1 ,2 ]
Boerwinkle, Eric [1 ,2 ,3 ,4 ]
Liu, Xiaoming [1 ,2 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Div Epidemiol Human Genet & Environm Sci, Sch Publ Hlth, Houston, TX 77030 USA
[2] Univ Texas Hlth Sci Ctr Houston, Human Genet Ctr, Sch Publ Hlth, Houston, TX 77030 USA
[3] Univ Texas Hlth Sci Ctr Houston, Ctr Human Genet, Brown Fdn Inst Mol Med Prevent Human Dis, Houston, TX 77030 USA
[4] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
基金
美国国家卫生研究院;
关键词
GENE; MUTATIONS; DATABASE; CANCER; IDENTIFICATION; ABERRANT; DEFECTS; DISEASE; DBNSFP; SITES;
D O I
10.1093/nar/gku1206
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
引用
收藏
页码:13534 / 13544
页数:11
相关论文
共 57 条
[1]  
Alpaydin Ethem., 2009, Introduction to Machine Learning, VSecond
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   Splicing in action: assessing disease causing sequence changes [J].
Baralle, D ;
Baralle, M .
JOURNAL OF MEDICAL GENETICS, 2005, 42 (10) :737-748
[4]   Missed threads The impact of pre-mRNA splicing defects on clinical practice [J].
Baralle, Diana ;
Lucassen, Anneke ;
Buratti, Emanuele .
EMBO REPORTS, 2009, 10 (08) :810-816
[5]   SPLICED SEGMENTS AT 5' TERMINUS OF ADENOVIRUS 2 LATE MESSENGER-RNA [J].
BERGET, SM ;
MOORE, C ;
SHARP, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1977, 74 (08) :3171-3175
[6]   LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq [J].
Bitton, Danny A. ;
Rallis, Charalampos ;
Jeffares, Daniel C. ;
Smith, Graeme C. ;
Chen, Yuan Y. C. ;
Codlin, Sandra ;
Marguerat, Samuel ;
Baehler, Juerg .
GENOME RESEARCH, 2014, 24 (07) :1169-1179
[7]   Mechanisms of alternative pre-messenger RNA splicing [J].
Black, DL .
ANNUAL REVIEW OF BIOCHEMISTRY, 2003, 72 :291-336
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus [J].
Brendel, V ;
Xing, LQ ;
Zhu, W .
BIOINFORMATICS, 2004, 20 (07) :1157-1169
[10]   PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) :49-65