UnSplicer: mapping spliced RNA-seq reads in compact genomes and filtering noisy splicing

被引:4
作者
Burns, Paul D. [1 ]
Li, Yang [2 ,3 ]
Ma, Jian [2 ,3 ]
Borodovsky, Mark [1 ,4 ,5 ]
机构
[1] Joint Georgia Tech & Emory Wallace H Coulter Dept, Atlanta, GA 30332 USA
[2] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
[3] Univ Illinois, Inst Genom Biol, Urbana, IL 61801 USA
[4] Georgia Tech, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[5] Moscow Inst Phys & Technol, Dept Bioinformat, Moscow 141700, Russia
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
JUNCTION DETECTION; CODON USAGE; GENE LENGTH; ALIGNMENT; ALGORITHM; TRANSCRIPT; DROSOPHILA; PATTERN;
D O I
10.1093/nar/gkt1141
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools.
引用
收藏
页数:11
相关论文
共 31 条
[1]   Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].
Au, Kin Fai ;
Jiang, Hui ;
Lin, Lan ;
Xing, Yi ;
Wong, Wing Hung .
NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]   The Drosophila melanogaster transcriptome by paired-end RNA sequencing [J].
Daines, Bryce ;
Wang, Hui ;
Wang, Liguo ;
Li, Yumei ;
Han, Yi ;
Emmert, David ;
Gelbart, William ;
Wang, Xia ;
Li, Wei ;
Gibbs, Richard ;
Chen, Rui .
GENOME RESEARCH, 2011, 21 (02) :315-324
[4]   Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, Arabidopsis [J].
Duret, L ;
Mouchiroud, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (08) :4482-4487
[5]   Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene [J].
Gonzalez-Porta, Mar ;
Frankish, Adam ;
Rung, Johan ;
Harrow, Jennifer ;
Brazma, Alvis .
GENOME BIOLOGY, 2013, 14 (07)
[6]   Quantification of stochastic noise of splicing and polyadenylation in Entamoeba histolytica [J].
Hon, Chung-Chau ;
Weber, Christian ;
Sismeiro, Odile ;
Proux, Caroline ;
Koutero, Mikael ;
Deloger, Marc ;
Das, Sarbashis ;
Agrahari, Mridula ;
Dillies, Marie-Agnes ;
Jagla, Bernd ;
Coppee, Jean-Yves ;
Bhattacharya, Alok ;
Guillen, Nancy .
NUCLEIC ACIDS RESEARCH, 2013, 41 (03) :1936-1952
[7]  
Huang Songbo, 2011, Frontiers in Genetics, V2, P46, DOI 10.3389/fgene.2011.00046
[8]   Asymptotic behaviors of support vector machines with Gaussian kernel [J].
Keerthi, SS ;
Lin, CJ .
NEURAL COMPUTATION, 2003, 15 (07) :1667-1689
[9]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[10]   TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions [J].
Kim, Daehwan ;
Pertea, Geo ;
Trapnell, Cole ;
Pimentel, Harold ;
Kelley, Ryan ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2013, 14 (04)