Polymorphisms predicting phylogeny in hepatitis B virus

被引:1
作者
Lourenco, Jose [1 ]
McNaughton, Anna L. [2 ]
Pley, Caitlin [3 ]
Obolski, Uri [4 ,5 ]
Gupta, Sunetra [6 ]
Matthews, Philippa C. [7 ,8 ,9 ,10 ]
机构
[1] Univ Lisbon, Biosyst & Integrat Sci Inst, Fac Sci, BioISI, P-1749016 Lisbon, Portugal
[2] Univ Bristol, Bristol Med Sch, Populat Hlth Sci, 5 Tyndall Ave, Bristol BS81UD, England
[3] Guys & St ThomasNHS Fdn Trust, Westminster Bridge Rd, London SE1, England
[4] Tel Aviv Univ, Sch Publ Hlth, IL-6997801 Tel Aviv, Israel
[5] Tel Aviv Univ, Porter Sch Environm & Earth Sci, IL-6997801 Tel Aviv, Israel
[6] Univ Oxford, Dept Zool, Medawar Bldg Pathogen Res, South Parks Rd, Oxford OX1 3SY, England
[7] Francis Crick Inst, 1 Midland Rd, London NW1 1AT, England
[8] UCL, Div Infect & Immun, Gower St, London WC1E 6BT, England
[9] Univ Coll London Hosp, Dept Infect Dis, 250 Euston Rd, London NW1 2PG, England
[10] Univ Oxford, Dept Med 0Nuffield, Medawar Bldg Pathogen Res,South Parks Rd, Oxford OX1 3SY, England
基金
英国惠康基金;
关键词
HBV; hepatitis B virus; hepadnavirus; diversity; selection; phylogeny; polymorphism; mutation; evolution; genotype; subgenotype; machine learning; covariation; EVOLUTION; ORIGIN; POLYMERASE; REVEALS; GENOME;
D O I
10.1093/ve/veac116
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Hepatitis B viruses (HBVs) are compact viruses with circular genomes of similar to 3.2 kb in length. Four genes (HBx, Core, Surface, and Polymerase) generating seven products are encoded on overlapping reading frames. Ten HBV genotypes have been characterised (A-J), which may account for differences in transmission, outcomes of infection, and treatment response. However, HBV genotyping is rarely undertaken, and sequencing remains inaccessible in many settings. We set out to assess which amino acid (aa) sites in the HBV genome are most informative for determining genotype, using a machine learning approach based on random forest algorithms (RFA). We downloaded 5,496 genome-length HBV sequences from a public database, excluding recombinant sequences, regions with conserved indels, and genotypes I and J. Each gene was separately translated into aa, and the proteins concatenated into a single sequence (length 1,614 aa). Using RFA, we searched for aa sites predictive of genotype and assessed covariation among the sites with a mutual information-based method. We were able to discriminate confidently between genotypes A-H using ten aa sites. Half of these sites (5/10) sites were identified in Polymerase (Pol), of which 4/5 were in the spacer domain and one in reverse transcriptase. A further 4/10 sites were located in Surface protein and a single site in HBx. There were no informative sites in Core. Properties of the aa were generally not conserved between genotypes at informative sites. Among the highest co-varying pairs of sites, there were fifty-five pairs that included one of these 'top ten' sites. Overall, we have shown that RFA analysis is a powerful tool for identifying aa sites that predict the HBV lineage, with an unexpectedly high number of such sites in the spacer domain, which has conventionally been viewed as unimportant for structure or function. Our results improve ease of genotype prediction from limited regions of HBV sequences and may have future applications in understanding HBV evolution.
引用
收藏
页数:6
相关论文
共 36 条
[1]   A random forest based biomarker discovery and power analysis framework for diagnostics research [J].
Acharjee, Animesh ;
Larkman, Joseph ;
Xu, Yuanwei ;
Cardoso, Victor Roth ;
Gkoutos, Georgios V. .
BMC MEDICAL GENOMICS, 2020, 13 (01)
[2]   THE P-GENE PRODUCT OF HEPATITIS-B VIRUS IS REQUIRED AS A STRUCTURAL COMPONENT FOR GENOMIC RNA ENCAPSIDATION [J].
BARTENSCHLAGER, R ;
JUNKERNIEPMANN, M ;
SCHALLER, H .
JOURNAL OF VIROLOGY, 1990, 64 (11) :5324-5332
[3]   Overlapping structure of hepatitis B virus (HBV) genome and immune selection pressure are critical forces modulating HBV evolution [J].
Cento, Valeria ;
Mirabelli, Carmen ;
Dimonte, Salvatore ;
Salpini, Romina ;
Han, Yue ;
Trimoulet, Pascale ;
Bertoli, Ada ;
Micheli, Valeria ;
Gubertini, Guido ;
Cappiello, Giuseppina ;
Spano, Alberto ;
Longo, Roberta ;
Bernassola, Martina ;
Mazzotta, Francesco ;
de Sanctis, Giuseppe Maria ;
Zhang, Xin Xin ;
Verheyen, Jens ;
Monforte, Antonella D'Arminio ;
Ceccherini-Silberstein, Francesca ;
Perno, Carlo Federico ;
Svicher, Valentina .
JOURNAL OF GENERAL VIROLOGY, 2013, 94 :143-149
[4]   Variability and conservation in hepatitis B virus core protein [J].
Chain, BM ;
Myers, R .
BMC MICROBIOLOGY, 2005, 5 (1)
[5]   Unveiling the roles of HBV polymerase for new antiviral strategies [J].
Clark, Daniel N. ;
Hu, Jianming .
FUTURE VIROLOGY, 2015, 10 (03) :283-295
[6]  
Downs L. O, 2020, WELLCOME OPEN RES, V5, P1
[7]   Predicting host tropism of influenza A virus proteins using random forest [J].
Eng, Christine L. P. ;
Tong, Joo Chuan ;
Tan, Tin Wee .
BMC MEDICAL GENOMICS, 2014, 7
[8]   EASL 2017 Clinical Practice Guidelines on the management of hepatitis B virus infection [J].
Lampertico P. ;
Agarwal K. ;
Berg T. ;
Buti M. ;
Janssen H.L.A. ;
Papatheodoridis G. ;
Zoulim F. ;
Tacke F. .
JOURNAL OF HEPATOLOGY, 2017, 67 (02) :370-398
[9]   Molecular Characterization of Hepatitis B Virus Isolates From Zimbabwean Blood Donors [J].
Gulube, Zandiswa ;
Chirara, Michael ;
Kew, Michael ;
Tanaka, Yasuhito ;
Mizokami, Masashi ;
Kramvis, Anna .
JOURNAL OF MEDICAL VIROLOGY, 2011, 83 (02) :235-244
[10]   Global and regional dispersal patterns of hepatitis B virus genotype E from and in Africa: A full-genome molecular analysis [J].
Ingasia, Luicer Anne Olubayo ;
Kostaki, Evangelia Georgia ;
Paraskevis, Dimitrios ;
Kramvis, Anna .
PLOS ONE, 2020, 15 (10)