Variant analysis of SARS-CoV-2 genomes

被引:372
作者
Koyama, Takahiko [1 ]
Platt, Daniel [1 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
CLINICAL CHARACTERISTICS; SEQUENCE; SEARCH; ACE2;
D O I
10.2471/BLT.20.253591
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Methods Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For Glade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5. Findings We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations,484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C >T (6334 samples), P47151. in the open reading frame lab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L845, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants. Conclusion We found that several variants of the SARS-CoV-2 genome exist and that the D614G Glade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population.
引用
收藏
页码:495 / 504
页数:10
相关论文
共 34 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 2020, MIDDL E RESP SYNDR C
[3]  
[Anonymous], 2020, Coronavirus disease (COVID-19): outbreak update
[4]  
[Anonymous], 2020, CUM NUMB REP PROB CA
[5]  
[Anonymous], 2020, ORFL POL SEV AC RESP
[6]  
Arvestad L., 2018, J OPEN SOURCE SOFTWA, V3, P955, DOI DOI 10.21105/
[7]   BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis [J].
Bouckaert, Remco ;
Vaughan, Timothy G. ;
Barido-Sottani, Joelle ;
Duchene, Sebastian ;
Fourment, Mathieu ;
Gavryushkina, Alexandra ;
Heled, Joseph ;
Jones, Graham ;
Kuehnert, Denise ;
De Maio, Nicola ;
Matschiner, Michael ;
Mendes, Fabio K. ;
Mueller, Nicola F. ;
Ogilvie, Huw A. ;
du Plessis, Louis ;
Popinga, Alex ;
Rambaut, Andrew ;
Rasmussen, David ;
Siveroni, Igor ;
Suchard, Marc A. ;
Wu, Chieh-Hsi ;
Xie, Dong ;
Zhang, Chi ;
Stadler, Tanja ;
Drummond, Alexei J. .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (04)
[8]   Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study [J].
Chen, Nanshan ;
Zhou, Min ;
Dong, Xuan ;
Qu, Jieming ;
Gong, Fengyun ;
Han, Yang ;
Qiu, Yang ;
Wang, Jingli ;
Liu, Ying ;
Wei, Yuan ;
Xia, Jia'an ;
Yu, Ting ;
Zhang, Xinxin ;
Zhang, Li .
LANCET, 2020, 395 (10223) :507-513
[9]   A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 [J].
Grifoni, Alba ;
Sidney, John ;
Zhang, Yun ;
Scheuermann, Richard H. ;
Peters, Bjoern ;
Sette, Alessandro .
CELL HOST & MICROBE, 2020, 27 (04) :671-+
[10]  
Guan WJ, 2020, NEW ENGL J MED, V382, P1861, DOI 10.1056/NEJMc2005203