Hypergeometric model of evolution of conserved protein coding sequences in the proteomes

被引:12
作者
Kuznetsov, VA [1 ]
机构
[1] NICHHD, Lab Integrat & Med Biophys, NIH, Bethesda, MD 20892 USA
来源
FLUCTUATION AND NOISE LETTERS | 2003年 / 3卷 / 03期
关键词
protein domains; motifs; conserved sequences; evolution dynamics; birth-death stochastic processes; Kolmogorov-Waring distribution; Yule distribution; generalized hypergeometric series;
D O I
10.1142/S0219477503001397
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The diversity of protein sequences that exists today has probably evolved from antecedent evolutionarily- conserved domain-like sequences (i.e. motifs, repeats, structural domains) encoded by short ancient genes. We have studied the statistical distributions of the occurrences of the domain-like families within proteins in the proteomes. A generalized hypergeometric stochastic process is introduced in order to model the evolution dynamics of these conserved sequences. We found that the limiting probability function associated with this process fits the empirical distributions for the 90 fully-sequence bacterial, archaeal and eukaryotic organisms. For eukaryotes, our limiting distribution is reduced to Waring's distribution. However, for many archaeal and bacterial organisms the empirical distributions degenerate to the Yule-like distribution. Comparison of all of these distributions implies critical evolutionary events, which lead to the proportional growth of the number of new protein-coding genes and proteome complexity in the eukaryotic organisms and suggest that evolution of many archaeal and bacterial organisms are subject to external global (ecological) forces. Best-fit model data predicts that (1) there are only similar to 5500 or so of the distinct InterPro domains in a given higher eukaryotic organism and that (2) a general trend in eukaryotic proteome evolution is described by the increase in frequency of multi-domain proteins composed of already-existing (older) distinct domains as oppose to creating new ones. Our model can be applicable for analysis of the evolution of word distributions in the texts and be used in other large-scale evolutional systems like the Internet, the economy and the universe.
引用
收藏
页码:L295 / L324
页数:30
相关论文
共 39 条
  • [1] [Anonymous], 2002, EVOLUTION DEV PATHWA
  • [2] Apic G, 2001, Bioinformatics, V17 Suppl 1, pS83
  • [3] BENNETT PE, 1969, STATISTICS STYLE, P29
  • [4] Evolution of the protein repertoire
    Chothia, C
    Gough, J
    Vogel, C
    Teichmann, SA
    [J]. SCIENCE, 2003, 300 (5626) : 1701 - 1703
  • [5] Darwin C., 1871, The descent of man, and selection in relation to sex
  • [6] Dover G, 2000, BIOESSAYS, V22, P1153, DOI 10.1002/1521-1878(200012)22:12&lt
  • [7] 1153::AID-BIES13&gt
  • [8] 3.0.CO
  • [9] 2-0
  • [10] Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network
    Featherstone, DE
    Broadie, K
    [J]. BIOESSAYS, 2002, 24 (03) : 267 - 274