genBlastA: Enabling BLAST to identify homologous gene sequences

被引:250
作者
She, Rong [2 ]
Chu, Jeffrey S. -C. [1 ]
Wang, Ke [2 ]
Pei, Jian [2 ]
Chen, Nansheng [1 ]
机构
[1] Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
ALIGNMENT;
D O I
10.1101/gr.082081.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 22 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[3]   GeneWise and genomewise [J].
Birney, E ;
Clamp, M ;
Durbin, R .
GENOME RESEARCH, 2004, 14 (05) :988-995
[4]   Genome sequence of the nematode C-elegans:: A platform for investigating biology [J].
不详 .
SCIENCE, 1998, 282 (5396) :2012-2018
[5]   WormBase:: a comprehensive data resource for Caenorhabditis biology and genomics [J].
Chen, NS ;
Harris, TW ;
Antoshechkin, I ;
Bastiani, C ;
Bieri, T ;
Blasiar, D ;
Bradnam, K ;
Canaran, P ;
Chan, J ;
Chen, CK ;
Chen, WJ ;
Cunningham, F ;
Davis, P ;
Kenny, E ;
Kishore, R ;
Lawson, D ;
Lee, R ;
Muller, HM ;
Nakamura, C ;
Pai, S ;
Ozersky, P ;
Petcherski, A ;
Rogers, A ;
Sab, A ;
Schwarz, EM ;
Van Auken, K ;
Wang, QH ;
Durbin, R ;
Spieth, J ;
Sternberg, PW ;
Stein, LD .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D383-D389
[6]   Fourfold faster rate of genome rearrangement in nematodes than in Drosophila [J].
Coghlan, A ;
Wolfe, KH .
GENOME RESEARCH, 2002, 12 (06) :857-867
[7]   Homology search for genes [J].
Cui, Xuefeng ;
Vinar, Tomas ;
Brejova, Brona ;
Shasha, Dennis ;
Li, Ming .
BIOINFORMATICS, 2007, 23 (13) :I97-I103
[8]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[9]   Comparative genomics [J].
Hardison, RC .
PLOS BIOLOGY, 2003, 1 (02) :156-160
[10]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202. Article published online before March 2002, 10.1101/gr.229202]