Computational identification of protein-coding sequences by comparative analysis

被引:1
作者
Fontaine, Arnaud [1 ]
Touzet, Helene [1 ]
机构
[1] Univ Lille 1, CNRS, UMR 8022, LIFL,INRIA Sequoia, F-59655 Villeneuve Dascq, France
来源
2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS | 2007年
关键词
D O I
10.1109/BIBM.2007.11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called Protea. Large-scale experimentation shows good results. Protea is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 35 条
[11]  
Griffith EEH, 2005, J AM ACAD PSYCHIATRY, V33, P12
[12]   Using multiple alignments to improve gene prediction [J].
Gross, SS ;
Brent, MR .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) :379-393
[13]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[14]   The UCSC Genome Browser Database [J].
Karolchik, D ;
Baertsch, R ;
Diekhans, M ;
Furey, TS ;
Hinrichs, A ;
Lu, YT ;
Roskin, KM ;
Schwartz, M ;
Sugnet, CW ;
Thomas, DJ ;
Weber, RJ ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :51-54
[15]   The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide [J].
Liolios, Konstantinos ;
Tavernarakis, Nektarios ;
Hugenholtz, Philip ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D332-D334
[16]   Eukaryotic regulatory element conservation analysis and identification using comparative genomics [J].
Liu, YY ;
Liu, XS ;
Wei, LP ;
Altman, RB ;
Batzoglou, S .
GENOME RESEARCH, 2004, 14 (03) :451-458
[17]   GeneMark.hmm: new solutions for gene finding [J].
Lukashin, AV ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (04) :1107-1115
[18]   Current methods of gene prediction, their strengths and weaknesses [J].
Mathé, C ;
Sagot, MF ;
Schiex, T ;
Rouzé, P .
NUCLEIC ACIDS RESEARCH, 2002, 30 (19) :4103-4117
[19]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[20]   GeneID in Drosophila [J].
Parra, G ;
Blanco, E ;
Guigó, R .
GENOME RESEARCH, 2000, 10 (04) :511-515