COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification

被引:314
作者
Struck, Daniel [1 ]
Lawyer, Glenn [2 ]
Ternes, Anne-Marie [1 ]
Schmit, Jean-Claude [1 ]
Bercoff, Danielle Perez [1 ]
机构
[1] CRP Sante, Lab Retrovirol, L-1526 Luxembourg, Luxembourg
[2] Max Planck Inst Informat, Dept Computat Biol & Appl Algorithm, D-66123 Saarbrucken, Germany
关键词
VIRUS TYPE-1 SUBTYPES; DISEASE PROGRESSION; ANTIRETROVIRAL THERAPY; BIOLOGICAL SEQUENCES; MARKOV-MODELS; CLASSIFICATION; RECOMBINATION; COMPRESSION; DETERMINANTS; SURVEILLANCE;
D O I
10.1093/nar/gku739
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Viral sequence classification has wide applications in clinical, epidemiological, structural and functional categorization studies. Most existing approaches rely on an initial alignment step followed by classification based on phylogenetic or statistical algorithms. Here we present an ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression. This tool, named COMET, was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets (1 090 698 and 10 625 sequences, respectively). COMET's sensitivity and specificity were comparable to or higher than the two other subtyping tools on both data sets for known subtypes. COMET also excelled in detecting and identifying new recombinant forms, a frequent feature of the HIV epidemic. Runtime comparisons showed that COMET was almost as fast as USEARCH. This study demonstrates the advantages of alignment-free classification of viral sequences, which feature high rates of variation, recombination and insertions/deletions. COMET is free to use via an online interface.
引用
收藏
页数:11
相关论文
共 39 条
[1]  
Abecasis A., 2007, HIV SEQUENCE COMPEND, P216
[2]   HIV-1 subtype distribution and its demographic determinants in newly diagnosed patients in Europe suggest highly compartmentalized epidemics [J].
Abecasis, Ana B. ;
Wensing, Annemarie M. J. ;
Paraskevis, Dimitris ;
Vercauteren, Jurgen ;
Theys, Kristof ;
de Vijver, David A. M. C. Van ;
Albert, Jan ;
Asjo, Birgitta ;
Balotta, Claudia ;
Beshkov, Danail ;
Camacho, Ricardo J. ;
Clotet, Bonaventura ;
De Gascun, Cillian ;
Griskevicius, Algis ;
Grossman, Zehava ;
Hamouda, Osamah ;
Horban, Andrzej ;
Kolupajeva, Tatjana ;
Korn, Klaus ;
Kostrikis, Leon G. ;
Kuecherer, Claudia ;
Liitsola, Kirsi ;
Linka, Marek ;
Nielsen, Claus ;
Otelea, Dan ;
Paredes, Roger ;
Poljak, Mario ;
Puchhammer-Stoeckl, Elisabeth ;
Schmit, Jean-Claude ;
Sonnerborg, Anders ;
Stanekova, Danika ;
Stanojevic, Maja ;
Struck, Daniel ;
Boucher, Charles A. B. ;
Vandamme, Anne-Mieke .
RETROVIROLOGY, 2013, 10
[3]   Identifying the Important HIV-1 Recombination Breakpoints [J].
Archer, John ;
Pinney, John W. ;
Fan, Jun ;
Simon-Loriere, Etienne ;
Arts, Eric J. ;
Negroni, Matteo ;
Robertson, David L. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (09)
[4]   Sequence determinants of breakpoint location during HIV-1 intersubtype recombination [J].
Baird, Heather A. ;
Galetto, Roman ;
Gao, Yong ;
Simon-Loriere, Etienne ;
Abreha, Measho ;
Archer, John ;
Fan, Jun ;
Robertson, David L. ;
Arts, Eric J. ;
Negroni, Matteo .
NUCLEIC ACIDS RESEARCH, 2006, 34 (18) :5203-5216
[5]   On prediction using variable order Markov models [J].
Begleiter, R ;
El-Yaniv, R ;
Yona, G .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 :385-421
[6]   Clustering by compression [J].
Cilibrasi, R ;
Vitányi, PMB .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (04) :1523-1545
[7]   DATA-COMPRESSION USING ADAPTIVE CODING AND PARTIAL STRING MATCHING [J].
CLEARY, JG ;
WITTEN, IH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1984, 32 (04) :396-402
[8]   MS4-Multi-Scale Selector of Sequence Signatures: An alignment-free method for classification of biological sequences [J].
Corel, Eduardo ;
Pitschi, Florian ;
Laprevotte, Ivan ;
Grasseau, Gilles ;
Didier, Gilles ;
Devauchelle, Claudine .
BMC BIOINFORMATICS, 2010, 11
[9]   An automated genotyping system for analysis of HIV-1 and other microbial sequences [J].
de Oliveira, T ;
Deforche, K ;
Cassol, S ;
Salminen, M ;
Paraskevis, D ;
Seebregts, C ;
Snoeck, J ;
van Rensburg, EJ ;
Wensing, AMJ ;
van de Vijver, DA ;
Boucher, CA ;
Camacho, R ;
Vandamme, AM .
BIOINFORMATICS, 2005, 21 (19) :3797-3800
[10]   Comparing sequences without using alignments: application to HIV/SIV subtyping [J].
Didier, Gilles ;
Debomy, Laurent ;
Pupin, Maude ;
Zhang, Ming ;
Grossmann, Alexander ;
Devauchelle, Claudine ;
Laprevotte, Ivan .
BMC BIOINFORMATICS, 2007, 8 (1)