Use of Average Mutual Information and Derived Measures to Find Coding Regions

被引:1
作者
Newcomb, Garin [1 ]
Sayood, Khalid [1 ]
机构
[1] Univ Nebraska, Dept Elect & Comp Engn, Lincoln, NE 68588 USA
关键词
mutual information; DNA annotation; protein coding; MICROBIAL GENE IDENTIFICATION; DNA; PREDICTION; PROTEIN;
D O I
10.3390/e23101324
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.
引用
收藏
页数:15
相关论文
共 30 条
[1]  
[Anonymous], 2012, INTRO DATA COMPRESSI
[2]   A new and updated resource for codon usage tables [J].
Athey, John ;
Alexaki, Aikaterini ;
Osipova, Ekaterina ;
Rostovtsev, Alexandre ;
Santana-Quintero, Luis V. ;
Katneni, Upendra ;
Simonyan, Vahan ;
Kimchi-Sarfaty, Chava .
BMC BIOINFORMATICS, 2017, 18
[3]  
Bauer M, 2005, IEEE DATA COMPR CONF, P452
[4]  
BAUER M, 2001, THESIS U NEBRASKA LI
[5]   The average mutual information profile as a genomic signature [J].
Bauer, Mark ;
Schuster, Sheldon M. ;
Sayood, Khalid .
BMC BIOINFORMATICS, 2008, 9 (1)
[6]   Decomposition of DNA sequence complexity [J].
Bernaola-Galván, P ;
Oliver, JL ;
Román-Roldán, R .
PHYSICAL REVIEW LETTERS, 1999, 83 (16) :3336-3339
[7]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[8]   CPG ISLANDS AND GENES [J].
CROSS, SH ;
BIRD, AP .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 1995, 5 (03) :309-314
[9]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[10]   Identifying bacterial genes and endosymbiont DNA with Glimmer [J].
Delcher, Arthur L. ;
Bratke, Kirsten A. ;
Powers, Edwin C. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2007, 23 (06) :673-679