FLU, an amino acid substitution model for influenza proteins

被引:45
作者
Cuong Cao Dang [2 ]
Le, Quang Si [1 ]
Gascuel, Olivier [3 ]
Vinh Sy Le [2 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Vietnam Natl Univ, Coll Technol, Hanoi, Vietnam
[3] Univ Montpellier 2, LIRMM, CNRS, Montpellier, France
关键词
MAXIMUM-LIKELIHOOD-ESTIMATION; DNA-SEQUENCES; TOPOLOGIES; EVOLUTION;
D O I
10.1186/1471-2148-10-99
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses. Results: A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from similar to 113, 000 influenza protein sequences, consisting of similar to 20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains similar to 42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/. Conclusions: FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.
引用
收藏
页数:11
相关论文
共 34 条
[11]   Race against time [J].
Fauci, AS .
NATURE, 2005, 435 (7041) :423-424
[12]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[13]  
Felsenstein J., 2004, INFERING PHYLOGENIES
[14]  
FITCH WALTER M., 1967, BIOCHEM GENET, V1, P65, DOI 10.1007/BF00487738
[15]   Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution [J].
Ghedin, E ;
Sengamalay, NA ;
Shumway, M ;
Zaborsky, J ;
Feldblyum, T ;
Subbu, V ;
Spiro, DJ ;
Sitz, J ;
Koo, H ;
Bolotov, P ;
Dernovoy, D ;
Tatusova, T ;
Bao, YM ;
St George, K ;
Taylor, J ;
Lipman, DJ ;
Fraser, CM ;
Taubenberger, JK ;
Salzberg, SL .
NATURE, 2005, 437 (7062) :1162-1166
[16]   Likelihood-based tests of topologies in phylogenetics [J].
Goldman, N ;
Anderson, JP ;
Rodrigo, AG .
SYSTEMATIC BIOLOGY, 2000, 49 (04) :652-670
[17]  
GU X, 1995, MOL BIOL EVOL, V12, P546
[18]   A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood [J].
Guindon, S ;
Gascuel, O .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :696-704
[19]   Homologous Recombination as an Evolutionary Force in the Avian Influenza A Virus [J].
He, Cheng-Qiang ;
Xie, Zhi-Xun ;
Han, Guan-Zhu ;
Dong, Jian-Bao ;
Wang, Dong ;
Liu, Jia-Bo ;
Ma, Le-Yuan ;
Tang, Xiao-Fei ;
Liu, Xi-Ping ;
Pang, Yao-Shan ;
Li, Guo-Rong .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (01) :177-187
[20]   Genomic analysis and geographic visualization of the spread of avian influenza (H5N1) [J].
Janies, Daniel ;
Hill, Andrew W. ;
Guralnick, Robert ;
Habib, Farhat ;
Waltari, Eric ;
Wheeler, Ward C. .
SYSTEMATIC BIOLOGY, 2007, 56 (02) :321-329