A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees

被引:57
作者
McBroome, Jakob [1 ,2 ]
Thornlow, Bryan [1 ,2 ]
Hinrichs, Angie S. [2 ]
Kramer, Alexander [1 ,2 ]
De Maio, Nicola [3 ]
Goldman, Nick [3 ]
Haussler, David [1 ,2 ]
Corbett-Detig, Russell [1 ,2 ]
Turakhia, Yatish [1 ,2 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Genom Inst, Santa Cruz, CA 95064 USA
[3] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Cambridge, England
关键词
COVID-19; SARS-CoV-2; phylogenetics; genomic surveillance; IMMUNODEFICIENCY-VIRUS TYPE-1;
D O I
10.1093/molbev/msab264
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain Glade and Pango lineage labels at Glade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://gihub.com/yatisht/usher, respectively.
引用
收藏
页码:5819 / 5824
页数:6
相关论文
共 22 条
[1]   Missing the forest for the trees:: Phylogenetic compression and its implications for inferring complex evolutionary histories [J].
Ané, C ;
Sanderson, MJ .
SYSTEMATIC BIOLOGY, 2005, 54 (01) :146-157
[2]   Phylogenetic Analyses of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) B.1.1.7 Lineage Suggest a Single Origin Followed by Multiple Exportation Events Versus Convergent Evolution [J].
Chaillon, A. ;
Smith, D. M. .
CLINICAL INFECTIOUS DISEASES, 2021, 73 (12) :2314-2317
[3]   Evolution of genes and genomes on the Drosophila phylogeny [J].
Clark, Andrew G. ;
Eisen, Michael B. ;
Smith, Douglas R. ;
Bergman, Casey M. ;
Oliver, Brian ;
Markow, Therese A. ;
Kaufman, Thomas C. ;
Kellis, Manolis ;
Gelbart, William ;
Iyer, Venky N. ;
Pollard, Daniel A. ;
Sackton, Timothy B. ;
Larracuente, Amanda M. ;
Singh, Nadia D. ;
Abad, Jose P. ;
Abt, Dawn N. ;
Adryan, Boris ;
Aguade, Montserrat ;
Akashi, Hiroshi ;
Anderson, Wyatt W. ;
Aquadro, Charles F. ;
Ardell, David H. ;
Arguello, Roman ;
Artieri, Carlo G. ;
Barbash, Daniel A. ;
Barker, Daniel ;
Barsanti, Paolo ;
Batterham, Phil ;
Batzoglou, Serafim ;
Begun, Dave ;
Bhutkar, Arjun ;
Blanco, Enrico ;
Bosak, Stephanie A. ;
Bradley, Robert K. ;
Brand, Adrianne D. ;
Brent, Michael R. ;
Brooks, Angela N. ;
Brown, Randall H. ;
Butlin, Roger K. ;
Caggese, Corrado ;
Calvi, Brian R. ;
de Carvalho, A. Bernardo ;
Caspi, Anat ;
Castrezana, Sergio ;
Celniker, Susan E. ;
Chang, Jean L. ;
Chapple, Charles ;
Chatterji, Sourav ;
Chinwalla, Asif ;
Civetta, Alberto .
NATURE, 2007, 450 (7167) :203-218
[4]   ALARMING COVID VARIANTS SHOW KEY ROLE OF GENOMIC SURVEILLANCE [J].
Cyranoski, David .
NATURE, 2021, 589 (7842) :337-338
[5]   Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland [J].
da Silva Filipe, Ana ;
Shepherd, James G. ;
Williams, Thomas ;
Hughes, Joseph ;
Aranday-Cortes, Elihu ;
Asamaphan, Patawee ;
Ashraf, Shirin ;
Balcazar, Carlos ;
Brunker, Kirstyn ;
Campbell, Alasdair ;
Carmichael, Stephen ;
Davis, Chris ;
Dewar, Rebecca ;
Gallagher, Michael D. ;
Gunson, Rory ;
Hill, Verity ;
Ho, Antonia ;
Jackson, Ben ;
James, Edward ;
Jesudason, Natasha ;
Johnson, Natasha ;
McWilliam Leitch, E. Carol ;
Li, Kathy ;
MacLean, Alasdair ;
Mair, Daniel ;
McAllister, David A. ;
McCrone, John T. ;
McDonald, Sarah E. ;
McHugh, Martin P. ;
Morris, A. Keith ;
Nichols, Jenna ;
Niebel, Marc ;
Nomikou, Kyriaki ;
Orton, Richard J. ;
O'Toole, Aine ;
Palmarini, Massimo ;
Parcell, Benjamin J. ;
Parr, Yasmin A. ;
Rambaut, Andrew ;
Rooke, Stefan ;
Shaaban, Sharif ;
Shah, Rajiv ;
Singer, Joshua B. ;
Smollett, Katherine ;
Starinskij, Igor ;
Tong, Lily ;
Sreenu, Vattipally B. ;
Wastnedge, Elizabeth ;
Holden, Matthew T. G. ;
Robertson, David L. .
NATURE MICROBIOLOGY, 2021, 6 (01) :112-+
[6]   Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California [J].
Deng, Xianding ;
Gu, Wei ;
Federman, Scot ;
Du Plessis, Louis ;
Pybus, Oliver G. ;
Faria, Nuno R. ;
Wang, Candace ;
Yu, Guixia ;
Bushnell, Brian ;
Pan, Chao-Yang ;
Guevara, Hugo ;
Sotomayor-Gonzalez, Alicia ;
Zorn, Kelsey ;
Gopez, Allan ;
Servellita, Venice ;
Hsu, Elaine ;
Miller, Steve ;
Bedford, Trevor ;
Greninger, Alexander L. ;
Roychoudhury, Pavitra ;
Starita, Lea M. ;
Famulare, Michael ;
Chu, Helen Y. ;
Shendure, Jay ;
Jerome, Keith R. ;
Anderson, Catie ;
Gangavarapu, Karthik ;
Zeller, Mark ;
Spencer, Emily ;
Andersen, Kristian G. ;
MacCannell, Duncan ;
Paden, Clinton R. ;
Li, Yan ;
Zhang, Jing ;
Tong, Suxiang ;
Armstrong, Gregory ;
Morrow, Scott ;
Willis, Matthew ;
Matyas, Bela T. ;
Mase, Sundari ;
Kasirye, Olivia ;
Park, Maggie ;
Masinde, Godfred ;
Chan, Curtis ;
Yu, Alexander T. ;
Chai, Shua J. ;
Villarino, Elsa ;
Bonin, Brandon ;
Wadford, Debra A. ;
Chiu, Charles Y. .
SCIENCE, 2020, 369 (6503) :582-+
[7]   The UCSC SARS-CoV-2 Genome Browser [J].
Fernandes, Jason D. ;
Hinrichs, Angie S. ;
Clawson, Hiram ;
Gonzalez, Jairo Navarro ;
Lee, Brian T. ;
Nassar, Luis R. ;
Raney, Brian J. ;
Rosenbloom, Kate R. ;
Nerli, Santrupti ;
Rao, Arjun A. ;
Schmelter, Daniel ;
Fyfe, Alastair ;
Maulding, Nathan ;
Zweig, Ann S. ;
Lowe, Todd M. ;
Ares, Manuel Jr Jr ;
Corbet-Detig, Russ ;
Kent, W. James ;
Haussler, David ;
Haeussler, Maximilian .
NATURE GENETICS, 2020, 52 (10) :991-998
[8]   Nextstrain: real-time tracking of pathogen evolution [J].
Hadfield, James ;
Megill, Colin ;
Bell, Sidney M. ;
Huddleston, John ;
Potter, Barney ;
Callender, Charlton ;
Sagulenko, Pavel ;
Bedford, Trevor ;
Neher, Richard A. .
BIOINFORMATICS, 2018, 34 (23) :4121-4123
[9]   Want to track pandemic variants faster? Fix the bioinformatics bottleneck Comment [J].
Hodcroft, Emma B. ;
De Maio, Nicola ;
Lanfear, Rob ;
MacCannell, Duncan R. ;
Minh, Bui Quang ;
Schmidt, Heiko A. ;
Stamatakis, Alexandros ;
Goldman, Nick ;
Dessimoz, Christophe .
NATURE, 2021, 591 (7848) :30-33
[10]   Generation and transmission of interlineage recombinants in the SARS-CoV-2 pandemic [J].
Jackson, Ben ;
Boni, Maciej F. ;
Bull, Matthew J. ;
Colleran, Amy ;
Colquhoun, Rachel M. ;
Darby, Alistair C. ;
Haldenby, Sam ;
Hill, Verity ;
Lucaci, Anita ;
McCrone, John T. ;
Nicholls, Samuel M. ;
O'Toole, Aine ;
Pacchiarini, Nicole ;
Poplawski, Radoslaw ;
Scher, Emily ;
Todd, Flora ;
Webster, Hermione J. ;
Whitehead, Mark ;
Wierzbicki, Claudia ;
Loman, Nicholas J. ;
Connor, Thomas R. ;
Robertson, David L. ;
Pybus, Oliver G. ;
Rambaut, Andrew .
CELL, 2021, 184 (20) :5179-+