LABEL: Fast and Accurate Lineage Assignment with Assessment of H5N1 and H9N2 Influenza A Hemagglutinins

被引:26
作者
Shepard, Samuel S. [1 ]
Davis, C. Todd [1 ]
Bahl, Justin [2 ,3 ]
Rivailler, Pierre [1 ]
York, Ian A. [1 ]
Donis, Ruben O. [1 ]
机构
[1] Ctr Dis Control & Prevent, Influenza Div, Atlanta, GA 30333 USA
[2] Duke NUS Grad Med Sch, Program Emerging Infect Dis, Lab Virus Evolut, Singapore, Singapore
[3] Univ Texas Sch Publ Hlth, Ctr Infect Dis, Houston, TX USA
关键词
MULTIPLE SEQUENCE ALIGNMENT; MAXIMUM-LIKELIHOOD; MUTATION-RATES; SOUTHERN CHINA; EVOLUTION; VIRUS; CLASSIFICATION; CIRCULATION; PLACEMENT; EMERGENCE;
D O I
10.1371/journal.pone.0086921
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The evolutionary classification of influenza genes into lineages is a first step in understanding their molecular epidemiology and can inform the subsequent implementation of control measures. We introduce a novel approach called Lineage Assignment By Extended Learning (LABEL) to rapidly determine cladistic information for any number of genes without the need for time-consuming sequence alignment, phylogenetic tree construction, or manual annotation. Instead, LABEL relies on hidden Markov model profiles and support vector machine training to hierarchically classify gene sequences by their similarity to pre-defined lineages. We assessed LABEL by analyzing the annotated hemagglutinin genes of highly pathogenic (H5N1) and low pathogenicity (H9N2) avian influenza A viruses. Using the WHO/FAO/OIE H5N1 evolution working group nomenclature, the LABEL pipeline quickly and accurately identified the H5 lineages of uncharacterized sequences. Moreover, we developed an updated clade nomenclature for the H9 hemagglutinin gene and show a similarly fast and reliable phylogenetic assessment with LABEL. While this study was focused on hemagglutinin sequences, LABEL could be applied to the analysis of any gene and shows great potential to guide molecular epidemiology activities, accelerate database annotation, and provide a data sorting tool for other large-scale bioinformatic studies.
引用
收藏
页数:12
相关论文
共 57 条
[1]  
[Anonymous], 2005, Data Mining: Concepts and Techniques
[2]  
[Anonymous], CUM NUMB CONF HUM CA
[3]  
Aubin JT, 2005, EMERG INFECT DIS, V11, P1515
[4]   Support Vector Machines and Kernels for Computational Biology [J].
Ben-Hur, Asa ;
Ong, Cheng Soon ;
Sonnenburg, Soeren ;
Schoelkopf, Bernhard ;
Raetsch, Gunnar .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (10)
[5]   Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood [J].
Berger, Simon A. ;
Krompass, Denis ;
Stamatakis, Alexandros .
SYSTEMATIC BIOLOGY, 2011, 60 (03) :291-302
[6]   Continuing progress towards a unified nomenclature for the highly pathogenic H5N1 avian influenza viruses: divergence of clade 2.2 viruses [J].
Brown, Ian H. ;
Capua, Ilaria ;
Cattoli, Giovanni ;
Chen, Hualan ;
Cox, Nancy ;
Davis, C. Todd ;
Donis, Ruben O. ;
Fouchier, Ron A. M. ;
Garten, Rebecca ;
Guan, Yi ;
Hay, Alan ;
Kawaoka, Yoshihiro ;
Mackenzie, John ;
McCauley, John ;
Mumford, Elizabeth ;
Olsen, Christopher ;
Perdue, Michael L. ;
Russell, Colin A. ;
Smith, Catherine ;
Smith, Derek ;
Smith, Gavin J. D. ;
Shu, Yuelong ;
Tashiro, Masato ;
Vijaykrishna, Dhanasekaran ;
Webster, Robert .
INFLUENZA AND OTHER RESPIRATORY VIRUSES, 2009, 3 (02) :59-62
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   H5N1 virus outbreak in migratory waterfowl [J].
Chen, H ;
Smith, GJD ;
Zhang, SY ;
Qin, K ;
Wang, J ;
Li, KS ;
Webster, RG ;
Peiris, JSM ;
Guan, Y .
NATURE, 2005, 436 (7048) :191-192
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797