Founder reconstruction enables scalable and seamless pangenomic analysis

被引:6
作者
Norri, Tuukka [1 ]
Cazaux, Bastien [1 ]
Donges, Saska [1 ]
Valenzuela, Daniel [1 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
基金
芬兰科学院;
关键词
READ ALIGNMENT; GENOMES; GRAPHS; SET;
D O I
10.1093/bioinformatics/btab516
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge. Results: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.
引用
收藏
页码:4611 / 4619
页数:9
相关论文
共 44 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Is it time to change the reference genome? [J].
Ballouz, Sara ;
Dobin, Alexander ;
Gillis, Jesse A. .
GENOME BIOLOGY, 2019, 20 (01)
[3]   Extending reference assembly models [J].
Church, Deanna M. ;
Schneider, Valerie A. ;
Steinberg, Karyn Meltz ;
Schatz, Michael C. ;
Quinlan, Aaron R. ;
Chin, Chen-Shan ;
Kitts, Paul A. ;
Aken, Bronwen ;
Marth, Gabor T. ;
Hoffman, Michael M. ;
Herrero, Javier ;
Mendoza, M. Lisandra Zepeda ;
Durbin, Richard ;
Flicek, Paul .
GENOME BIOLOGY, 2015, 16
[4]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[5]   Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[6]   Indexes of Large Genome Collections on a PC [J].
Danek, Agnieszka ;
Deorowicz, Sebastian ;
Grabowski, Szymon .
PLOS ONE, 2014, 9 (10)
[7]   Improved genome inference in the MHC using a population reference graph [J].
Dilthey, Alexander ;
Cox, Charles ;
Iqbal, Zamin ;
Nelson, Matthew R. ;
McVean, Gil .
NATURE GENETICS, 2015, 47 (06) :682-688
[8]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[9]   A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree [J].
Eberle, Michael A. ;
Fritzilas, Epameinondas ;
Krusche, Peter ;
Kallberg, Morten ;
Moore, Benjamin L. ;
Bekritsky, Mitchell A. ;
Iqbal, Zamin ;
Chuang, Han-Yu ;
Humphray, Sean J. ;
Halpern, Aaron L. ;
Kruglyak, Semyon ;
Margulies, Elliott H. ;
McVean, Gil ;
Bentley, David R. .
GENOME RESEARCH, 2017, 27 (01) :157-164
[10]   GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs [J].
Eggertsson, Hannes P. ;
Kristmundsdottir, Snaedis ;
Beyter, Doruk ;
Jonsson, Hakon ;
Skuladottir, Astros ;
Hardarson, Marteinn T. ;
Gudbjartsson, Daniel F. ;
Stefansson, Kari ;
Halldorsson, Bjarni V. ;
Melsted, Pall .
NATURE COMMUNICATIONS, 2019, 10 (1)