SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration

被引:5
作者
Matushyn, Mykyta [1 ]
Bose, Madhuchanda [1 ]
Mahmoud, Abdallah Amr [1 ]
Cuthbertson, Lewis [1 ]
Tello, Carlos [1 ]
Bircan, Karatug Ozan [1 ]
Terpolovsky, Andrew [1 ]
Bamunusinghe, Varuna [1 ]
Khan, Umar [1 ]
Novkovic, Biljana [1 ]
Grabherr, Manfred G. [1 ]
Yazdi, Puya G. [1 ]
机构
[1] SelfDecode Com, 1031 Ives Dairy Rd Suite 228-1047, Miami, FL 33179 USA
关键词
Bioinformatics; GWAS; Summary statistics; PRS; Genetics; GENOME-WIDE ASSOCIATION;
D O I
10.1186/s12859-022-04920-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Generating polygenic risk scores for diseases and complex traits requires high quality GWAS summary statistic files. Often, these files can be difficult to acquire either as a result of unshared or incomplete data. To date, bioinformatics tools which focus on restoring missing columns containing identification and association data are limited, which has the potential to increase the number of usable GWAS summary statistics files. Results SumStatsRehab was able to restore rsID, effect/other alleles, chromosome, base pair position, effect allele frequencies, beta, standard error, and p-values to a better extent than any other currently available tool, with minimal loss. Conclusions SumStatsRehab offers a unique tool utilizing both functional programming and pipeline-like architecture, allowing users to generate accurate data restorations for incomplete summary statistics files. This in turn, increases the number of usable GWAS summary statistics files, which may be invaluable for less researched health traits.
引用
收藏
页数:12
相关论文
共 20 条
[1]   GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size [J].
Asif, Huma ;
Alliey-Rodriguez, Ney ;
Keedy, Sarah ;
Tamminga, Carol A. ;
Sweeney, John A. ;
Pearlson, Godfrey ;
Clementz, Brett A. ;
Keshavan, Matcheri S. ;
Buckley, Peter ;
Liu, Chunyu ;
Neale, Benjamin ;
Gershon, Elliot S. .
MOLECULAR PSYCHIATRY, 2021, 26 (06) :2048-2055
[2]   Developing and evaluating polygenic risk prediction models for stratified disease prevention [J].
Chatterjee, Nilanjan ;
Shi, Jianxin ;
Garcia-Closas, Montserrat .
NATURE REVIEWS GENETICS, 2016, 17 (07) :392-406
[3]   PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores [J].
Chen, Lawrence M. ;
Yao, Nelson ;
Garg, Elika ;
Zhu, Yuecai ;
Nguyen, Thao T. T. ;
Pokhvisneva, Irina ;
Dass, Shantala A. Hari ;
Unternaehrer, Eva ;
Gaudreau, Helene ;
Forest, Marie ;
McEwen, Lisa M. ;
MacIsaac, Julia L. ;
Kobor, Michael S. ;
Greenwood, Celia M. T. ;
Silveira, Patricia P. ;
Meaney, Michael J. ;
O'Donnell, Kieran J. .
BMC BIOINFORMATICS, 2018, 19
[4]   Tutorial: a guide to performing polygenic risk score analyses [J].
Choi, Shing Wan ;
Mak, Timothy Shin-Heng ;
O'Reilly, Paul F. .
NATURE PROTOCOLS, 2020, 15 (09) :2759-2772
[5]   Shared genetic origin of asthma, hay fever and eczema elucidates allergic disease biology [J].
Ferreira, Manuel A. ;
Vonk, Judith M. ;
Baurecht, Hansjorg ;
Marenholz, Ingo ;
Tian, Chao ;
Hoffman, Joshua D. ;
Helmer, Quinta ;
Tillander, Annika ;
Ullemar, Vilhelmina ;
van Dongen, Jenny ;
Lu, Yi ;
Rueschendorf, Franz ;
Esparza-Gordillo, Jorge ;
Medway, Chris W. ;
Mountjoy, Edward ;
Burrows, Kimberley ;
Hummel, Oliver ;
Grosche, Sarah ;
Brumpton, Ben M. ;
Witte, John S. ;
Hottenga, Jouke-Jan ;
Willemsen, Gonneke ;
Zheng, Jie ;
Rodriguez, Elke ;
Hotze, Melanie ;
Franke, Andre ;
Revez, Joana A. ;
Beesley, Jonathan ;
Matheson, Melanie C. ;
Dharmage, Shyamali C. ;
Bain, Lisa M. ;
Fritsche, Lars G. ;
Gabrielsen, Maiken E. ;
Balliu, Brunilda ;
Nielsen, Jonas B. ;
Zhou, Wei ;
Hveem, Kristian ;
Langhammer, Arnulf ;
Holmen, Oddgeir L. ;
Loset, Mari ;
Abecasis, Goncalo R. ;
Willer, Cristen J. ;
Arnold, Andreas ;
Homuth, Georg ;
Schmidt, Carsten O. ;
Thompson, Philip J. ;
Martin, Nicholas G. ;
Duffy, David L. ;
Novak, Natalija ;
Schulz, Holger .
NATURE GENETICS, 2017, 49 (12) :1752-+
[6]   Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes [J].
Jiang, Yu ;
Chen, Sai ;
McGuire, Daniel ;
Chen, Fang ;
Liu, Mengzhen ;
Iacono, William G. ;
Hewitt, John K. ;
Hokanson, John E. ;
Krauter, Kenneth ;
Laakso, Markku ;
Li, Kevin W. ;
Lutz, Sharon M. ;
McGue, Matthew ;
Pandit, Anita ;
Zajac, Gregory J. M. ;
Boehnke, Michael ;
Abecasis, Goncalo R. ;
Vrieze, Scott I. ;
Zhan, Xiaowei ;
Jiang, Bibo ;
Liu, Dajiang J. .
PLOS GENETICS, 2018, 14 (07)
[7]   Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations [J].
Khera, Amit V. ;
Chaffin, Mark ;
Aragam, Krishna G. ;
Haas, Mary E. ;
Roselli, Carolina ;
Choi, Seung Hoan ;
Natarajan, Pradeep ;
Lander, Eric S. ;
Lubitz, Steven A. ;
Ellinor, Patrick T. ;
Kathiresan, Sekar .
NATURE GENETICS, 2018, 50 (09) :1219-+
[8]   The variant call format provides efficient and robust storage of GWAS summary statistics [J].
Lyon, Matthew S. ;
Andrews, Shea J. ;
Elsworth, Ben ;
Gaunt, Tom R. ;
Hemani, Gibran ;
Marcora, Edoardo .
GENOME BIOLOGY, 2021, 22 (01)
[9]   Perspective Workshop proceedings: GWAS summary statistics standards and sharing [J].
Macarthur, Jacqueline A. L. ;
Buniello, Annalisa ;
Harris, Laura W. ;
Hayhurst, James ;
Mcmahon, Aoife ;
Sollis, Elliot ;
Cerezo, Maria ;
Hall, Peggy ;
Lewis, Elizabeth ;
Whetzel, Patricia L. ;
Bahcall, Orli G. ;
Barroso, Ines ;
Carroll, Robert J. ;
Inouye, Michael ;
Manolio, Teri A. ;
Rich, Stephen S. ;
Hindorff, Lucia A. ;
Wiley, Ken ;
Parkinson, Helen .
CELL GENOMICS, 2021, 1 (01)
[10]   Genome-wide association studies for complex traits: consensus, uncertainty and challenges [J].
McCarthy, Mark I. ;
Abecasis, Goncalo R. ;
Cardon, Lon R. ;
Goldstein, David B. ;
Little, Julian ;
Ioannidis, John P. A. ;
Hirschhorn, Joel N. .
NATURE REVIEWS GENETICS, 2008, 9 (05) :356-369