Ultrafast and scalable variant annotation and prioritization with big functional genomics data

被引:22
作者
Huang, Dandan [1 ,2 ,3 ]
Yi, Xianfu [4 ]
Zhou, Yao [2 ]
Yao, Hongcheng [5 ]
Xu, Hang [1 ,5 ]
Wang, Jianhua [2 ]
Zhang, Shijie [2 ]
Nong, Wenyan [6 ]
Wang, Panwen [7 ,8 ]
Shi, Lei [3 ]
Xuan, Chenghao [3 ]
Li, Miaoxin [9 ]
Wang, Junwen [7 ,8 ]
Li, Weidong [10 ]
Kwan, Hoi Shan [6 ]
Sham, Pak Chung [11 ]
Wang, Kai [12 ]
Li, Mulin Jun [1 ,2 ,13 ]
机构
[1] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Prov & Minist Cosponsored Collaborat Innovat Ctr, Natl Clin Res Ctr Canc, Tianjin 300070, Peoples R China
[2] Tianjin Med Univ, Sch Basic Med Sci, Dept Pharmacol, Tianjin Key Lab Inflammat Biol, Tianjin 300070, Peoples R China
[3] Tianjin Med Univ, Sch Basic Med Sci, Dept Biochem & Mol Biol, Tianjin 300070, Peoples R China
[4] Tianjin Med Univ, Sch Biomed Engn, Tianjin 300070, Peoples R China
[5] Univ Hong Kong, Sch Biomed Sci, LKS Fac Med, Hong Kong 999077, Peoples R China
[6] Chinese Univ Hong Kong, Sch Life Sci, Hong Kong 999077, Peoples R China
[7] Mayo Clin, Dept Hlth Sci Res, Scottsdale, AZ 85259 USA
[8] Mayo Clin, Ctr Individualized Med, Scottsdale, AZ 85259 USA
[9] Sun Yat Sen Univ, Zhongshan Sch Med, Affiliated Hosp 1, Ctr Genome Res,Ctr Precis Med, Guangzhou 510080, Peoples R China
[10] Tianjin Med Univ, Sch Basic Med Sci, Dept Genet, Tianjin 300070, Peoples R China
[11] Univ Hong Kong, LKS Fac Med, Dept Psychiat, Ctr Genom Sci, Hong Kong 999077, Peoples R China
[12] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[13] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjin Key Lab Mol Canc Epidemiol, Dept Epidemiol & Biostat, Tianjin 300070, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA ELEMENTS; FRAMEWORK; IDENTIFICATION; ENCYCLOPEDIA; PREDICTION; MUTATIONS; DISCOVERY; LOCI;
D O I
10.1101/gr.267997.120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The advances of large-scale genomics studies have enabled compilation of cell type-specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.
引用
收藏
页码:1789 / 1801
页数:14
相关论文
共 65 条
[1]  
[Anonymous], 2015, Nature, DOI DOI 10.1038/NATURE15393
[2]   FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications [J].
Backenroth, Daniel ;
He, Zihuai ;
Kiryluk, Krzysztof ;
Boeva, Valentina ;
Pethukova, Lynn ;
Khurana, Ekta ;
Christiano, Angela ;
Buxbaum, Joseph D. ;
Ionita-Laza, Iuliana .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 102 (05) :920-942
[3]   Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology [J].
Brody, Jennifer A. ;
Morrison, Alanna C. ;
Bis, Joshua C. ;
O'Connell, Jeffrey R. ;
Brown, Michael R. ;
Huffman, Jennifer E. ;
Ames, Darren C. ;
Carroll, Andrew ;
Conomos, Matthew P. ;
Gabriel, Stacey ;
Gibbs, Richard A. ;
Gogarten, Stephanie M. ;
Gupta, Namrata ;
Jaquish, Cashell E. ;
Johnson, Andrew D. ;
Lewis, Joshua P. ;
Liu, Xiaoming ;
Manning, Alisa K. ;
Papanicolaou, George J. ;
Pitsillides, Achilleas N. ;
Rice, Kenneth M. ;
Salerno, William ;
Sitlani, Colleen M. ;
Smith, Nicholas L. ;
Heckbert, Susan R. ;
Laurie, Cathy C. ;
Mitchell, Braxton D. ;
Vasan, Ramachandran S. ;
Rich, Stephen S. ;
Rotter, Jerome I. ;
Wilson, James G. ;
Boerwinkle, Eric ;
Psaty, Bruce M. ;
Cupples, L. Adrienne .
NATURE GENETICS, 2017, 49 (11) :1560-1563
[4]   The International Human Epigenome Consortium Data Portal [J].
Bujold, David ;
Morais, David Anderson de Lima ;
Gauthier, Carol ;
Cote, Catherine ;
Caron, Maxime ;
Kwan, Tony ;
Chen, Kuang Chung ;
Laperle, Jonathan ;
Markovits, Alexei Nordell ;
Pastinen, Tomi ;
Caron, Bryan ;
Veilleux, Alain ;
Jacques, Pierre-Etienne ;
Bourque, Guillaume .
CELL SYSTEMS, 2016, 3 (05) :496-+
[5]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[6]   Genomic basis for RNA alterations in cancer [J].
Calabrese, Claudia ;
Davidson, Natalie R. ;
Demircioglu, Deniz ;
Fonseca, Nuno A. ;
He, Yao ;
Kahles, Andre ;
Kjong-Van Lehmann ;
Liu, Fenglin ;
Shiraishi, Yuichi ;
Soulette, Cameron M. ;
Urban, Lara ;
Greger, Liliana ;
Li, Siliang ;
Liu, Dongbing ;
Perry, Marc D. ;
Xiang, Qian ;
Zhang, Fan ;
Zhang, Junjun ;
Bailey, Peter ;
Erkek, Serap ;
Hoadley, Katherine A. ;
Hou, Yong ;
Huska, Matthew R. ;
Kilpinen, Helena ;
Korbel, Jan O. ;
Marin, Maximillian G. ;
Markowski, Julia ;
Nandi, Tannistha ;
Pan-Hammarstrom, Qiang ;
Pedamallu, Chandra Sekhar ;
Siebert, Reiner ;
Stark, Stefan G. ;
Su, Hong ;
Tan, Patrick ;
Waszak, Sebastian M. ;
Yung, Christina ;
Zhu, Shida ;
Awadalla, Philip ;
Creighton, Chad J. ;
Meyerson, Matthew ;
Ouellette, B. F. Francis ;
Wu, Kui ;
Yang, Huanming ;
Brazma, Alvis ;
Brooks, Angela N. ;
Goke, Jonathan ;
Raetsch, Gunnar ;
Schwarz, Roland F. ;
Stegle, Oliver ;
Zhang, Zemin .
NATURE, 2020, 578 (7793) :129-+
[7]   NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans [J].
Caron, Barthelemy ;
Luo, Yufei ;
Rausell, Antonio .
GENOME BIOLOGY, 2019, 20 (1)
[8]   Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk [J].
Castel, Stephane E. ;
Cervera, Alejandra ;
Mohammadi, Pejman ;
Aguet, Francois ;
Reverter, Ferran ;
Wolman, Aaron ;
Guigo, Roderic ;
Iossifov, Ivan ;
Vasileva, Ana ;
Lappalainen, Tuuli .
NATURE GENETICS, 2018, 50 (09) :1327-+
[9]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[10]   The Encyclopedia of DNA elements (ENCODE): data portal update [J].
Davis, Carrie A. ;
Hitz, Benjamin C. ;
Sloan, Cricket A. ;
Chan, Esther T. ;
Davidson, Jean M. ;
Gabdank, Idan ;
Hilton, Jason A. ;
Jain, Kriti ;
Baymuradov, Ulugbek K. ;
Narayanan, Aditi K. ;
Onate, Kathrina C. ;
Graham, Keenan ;
Miyasato, Stuart R. ;
Dreszer, Timothy R. ;
Strattan, J. Seth ;
Jolanki, Otto ;
Tanaka, Forrest Y. ;
Cherry, J. Michael .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D794-D801