Computational identification of protein-coding sequences by comparative analysis

被引:1
作者
Fontaine, Arnaud [1 ]
Touzet, Helene [1 ]
机构
[1] Univ Lille 1, CNRS, UMR 8022, LIFL,INRIA Sequoia, F-59655 Villeneuve Dascq, France
来源
2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS | 2007年
关键词
D O I
10.1109/BIBM.2007.11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called Protea. Large-scale experimentation shows good results. Protea is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 35 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Swiss-Prot: Juggling between evolution and stability [J].
Bairoch, A ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E .
BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) :39-55
[3]   GeneWise and genomewise [J].
Birney, E ;
Clamp, M ;
Durbin, R .
GENOME RESEARCH, 2004, 14 (05) :988-995
[4]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[5]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   CSTminer:: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison [J].
Castrignanò, T ;
Canali, A ;
Grillo, G ;
Liuni, S ;
Mignone, F ;
Pesole, G .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W624-W627
[8]   Ab initio identification of putative human transcription factor binding sites by comparative genomics - art. no. 110 [J].
Corà, D ;
Herrmann, C ;
Dieterich, C ;
Di Cunto, F ;
Provero, P ;
Caselle, M .
BMC BIOINFORMATICS, 2005, 6 (1)
[9]  
Do JH, 2006, J MICROBIOL, V44, P137
[10]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251