Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

被引:7
作者
Axelsen, Jacob Bock [1 ,2 ]
Yan, Koon-Kiu [1 ,3 ]
Maslov, Sergei [1 ,3 ]
机构
[1] Brookhaven Natl Lab, Dept Condensed Matter Phys & Mat Sci, Upton, NY 11973 USA
[2] Niels Bohr Inst, Ctr Models Life, DK-2100 Copenhagen, Denmark
[3] SUNY Stony Brook, Dept Phys & Astron, Stony Brook, NY 11794 USA
关键词
D O I
10.1186/1745-6150-2-32
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results: We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form similar to p(-gamma) with the value of the exponent. around 4 for the majority of organisms used in this study. This implies that the intraprotein variability of substitution rates is best described by the Gamma-distribution with the exponent alpha approximate to 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion: We separately measure the short-term ("raw") duplication and deletion rates, r(dup)(*), r(del)(*) which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts r(dup), r(del). High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates r(del)(*) were shown to systematically increase with N-genes. Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications. This interpretation is corroborated by our analysis of the genome of Paramecium tetraurelia where the p(-4) profile of the histogram is gradually restored by the successive removal of paralogs generated in its four known whole-genome duplication events.
引用
收藏
页数:19
相关论文
共 25 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia [J].
Aury, Jean-Marc ;
Jaillon, Olivier ;
Duret, Laurent ;
Noel, Benjamin ;
Jubin, Claire ;
Porcel, Betina M. ;
Segurens, Beatrice ;
Daubin, Vincent ;
Anthouard, Veronique ;
Aiach, Nathalie ;
Arnaiz, Olivier ;
Billaut, Alain ;
Beisson, Janine ;
Blanc, Isabelle ;
Bouhouche, Khaled ;
Camara, Francisco ;
Duharcourt, Sandra ;
Guigo, Roderic ;
Gogendeau, Delphine ;
Katinka, Michael ;
Keller, Anne-Marie ;
Kissmehl, Roland ;
Klotz, Catherine ;
Koll, France ;
Le Mouel, Anne ;
Lepere, Gersende ;
Malinsky, Sophie ;
Nowacki, Mariusz ;
Nowak, Jacek K. ;
Plattner, Helmut ;
Poulain, Julie ;
Ruiz, Francoise ;
Serrano, Vincent ;
Zagulski, Marek ;
Dessen, Philippe ;
Betermier, Mireille ;
Weissenbach, Jean ;
Scarpelli, Claude ;
Schaechter, Vincent ;
Sperling, Linda ;
Meyer, Eric ;
Cohen, Jean ;
Wincker, Patrick .
NATURE, 2006, 444 (7116) :171-178
[3]   Expanding protein universe and its origin from the biological Big Bang [J].
Dokholyan, NV ;
Shakhnovich, B ;
Shakhnovich, EI .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :14132-14136
[4]   Understanding hierarchical protein evolution from first principles [J].
Dokholyan, NV ;
Shakhnovich, EI .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 312 (01) :289-307
[5]  
GILLESPIE DJH, 1994, CAUSES MOL EVOLUTION
[6]   From complete genomes to measures of substitution rate variability within and between proteins [J].
Grishin, NV ;
Wolf, YI ;
Koonin, EV .
GENOME RESEARCH, 2000, 10 (07) :991-1000
[7]   Role of duplicate genes in genetic robustness against null mutations [J].
Gu, ZL ;
Steinmetz, LM ;
Gu, X ;
Scharfe, C ;
Davis, RW ;
Li, WH .
NATURE, 2003, 421 (6918) :63-66
[8]   Rapid divergence in expression between duplicate genes inferred from microarray data [J].
Gu, ZL ;
Nicolae, D ;
Lu, HHS ;
Li, WH .
TRENDS IN GENETICS, 2002, 18 (12) :609-613
[9]   The frequency distribution of gene family sizes in complete genomes [J].
Huynen, MA ;
van Nimwegen, E .
MOLECULAR BIOLOGY AND EVOLUTION, 1998, 15 (05) :583-589
[10]   Birth and death of protein domains: A simple model of evolution explains power law behavior [J].
Karev, Georgy P. ;
Wolf, Yuri I. ;
Rzhetsky, Andrey Y. ;
Berezovskaya, Faina S. ;
Koonin, Eugene V. .
BMC EVOLUTIONARY BIOLOGY, 2002, 2 (1)