Genome phylogeny based on short-range correlations in DNA sequences

被引:20
作者
Dehnert, M
Plaumann, R
Helm, WE
Hütt, MT
机构
[1] TH Darmstadt, Dept Biol, Bioinformat Grp, D-64287 Darmstadt, Germany
[2] Univ Appl Sci, Math & Sci Fac, D-64295 Darmstadt, Germany
关键词
information theory; eukaryote genomes; species distinction;
D O I
10.1089/cmb.2005.12.545
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The surprising fact that global statistical properties computed on a genomewide scale may reveal species information has first been observed in studies of dinucleotide frequencies. Here we will look at the same phenomenon with a totally different statistical approach. We show that patterns in the short-range statistical correlations in DNA sequences serve as evolutionary fingerprints of eukaryotes. All chromosomes of a species display the same characteristic pattern, markedly different from those of other species. The chromosomes of a species are sorted onto the same branch of a phylogenetic tree due to this correlation pattern. The average correlation between nucleotides at a distance k is quantified in two independent ways: (i) by estimating it from a higher-order Markov process and (ii) by computing the mutual information function at a distance k. We show how the quality of phylogenetic reconstruction depends on the range of correlation strengths and on the length of the underlying sequence segment. This concept of the correlation pattern as a phylogenetic signature of eukaryote species combines two rather distant domains of research, namely phylogenetic analysis based on molecular observation and the study of the correlation structure of DNA sequences.
引用
收藏
页码:545 / 553
页数:9
相关论文
共 30 条
  • [1] Informatics for unveiling hidden genome signatures
    Abe, T
    Kanaya, S
    Kinouchi, M
    Ichiba, Y
    Kozuki, T
    Ikemura, T
    [J]. GENOME RESEARCH, 2003, 13 (04) : 693 - 702
  • [2] [Anonymous], NPS5578022
  • [3] [Anonymous], 2004, PHYLIP PHYLOGENY INF
  • [4] Alu repeats and human genomic diversity
    Batzer, MA
    Deininger, PL
    [J]. NATURE REVIEWS GENETICS, 2002, 3 (05) : 370 - 379
  • [5] LONG-RANGE CORRELATION-PROPERTIES OF CODING AND NONCODING DNA-SEQUENCES - GENBANK ANALYSIS
    BULDYREV, SV
    GOLDBERGER, AL
    HAVLIN, S
    MANTEGNA, RN
    MATSA, ME
    PENG, CK
    SIMONS, M
    STANLEY, HE
    [J]. PHYSICAL REVIEW E, 1995, 51 (05): : 5084 - 5091
  • [6] A discrete autoregressive process as a model for short-range correlations in DNA sequences
    Dehnert, M
    Helm, WE
    Hütt, MT
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2003, 327 (3-4) : 535 - 553
  • [7] Efron B., 1994, INTRO BOOTSTRAP, DOI DOI 10.1201/9780429246593
  • [8] Genome-scale compositional comparisons in eukaryotes
    Gentles, AJ
    Karlin, S
    [J]. GENOME RESEARCH, 2001, 11 (04) : 540 - 546
  • [9] Species independence of mutual information in coding and noncoding DNA
    Grosse, I
    Herzel, H
    Buldyrev, SV
    Stanley, HE
    [J]. PHYSICAL REVIEW E, 2000, 61 (05): : 5624 - 5629
  • [10] Origin and phylogenetic distribution of Alu DNA repeats:: Irreversible events in the evolution of primates
    Hamdi, H
    Nishio, H
    Zielinski, R
    Dugaiczyk, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 289 (04) : 861 - 871