AccuCalc: A Python']Python Package for Accuracy Calculation in GWAS

被引:4
作者
Biova, Jana [1 ]
Dietz, Nicholas [2 ]
Chan, Yen On [3 ,4 ]
Joshi, Trupti [3 ,4 ,5 ,6 ]
Bilyeu, Kristin [7 ]
Skrabisova, Maria [1 ]
机构
[1] Palacky Univ Olomouc, Fac Sci, Dept Biochem, Olomouc 78371, Czech Republic
[2] Univ Missouri, Div Plant Sci, Columbia, MO 65201 USA
[3] Univ Missouri, Christopher S Bond Life Sci Ctr, Columbia, MO 65212 USA
[4] Univ Missouri, MU Data Sci & Informat Inst, Columbia, MO 65212 USA
[5] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65212 USA
[6] Univ Missouri, Sch Med, Dept Hlth Management & Informat, Columbia, MO 65212 USA
[7] Univ Missouri, USDA ARS, Plant Genet Res Unit, Columbia, MO 65211 USA
关键词
!text type='python']python[!/text] package; GWAS; accuracy; causative mutation; SP2CM; Manhattan plot;
D O I
10.3390/genes14010123
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.
引用
收藏
页数:13
相关论文
共 21 条
  • [1] Ball Roderick D, 2013, Methods Mol Biol, V1019, P37, DOI 10.1007/978-1-62703-447-0_3
  • [2] Genome-wide Association Mapping of Qualitatively Inherited Traits in a Germplasm Collection
    Bandillo, Nonoy B.
    Lorenz, Aaron J.
    Graef, George L.
    Arquin, Diego
    Hyten, David L.
    Nelson, Randall L.
    Specht, James E.
    [J]. PLANT GENOME, 2017, 10 (02)
  • [3] Status and prospects of genome-wide association studies in plants
    Cortes, Laura Tibbs
    Zhang, Zhiwu
    Yu, Jianming
    [J]. PLANT GENOME, 2021, 14 (01)
  • [4] Gondro Cedric, 2013, Methods Mol Biol, V1019, P129, DOI 10.1007/978-1-62703-447-0_5
  • [5] The UCSC Genome Browser database: 2021 update
    Gonzalez, Jairo Navarro
    Zweig, Ann S.
    Speir, Matthew L.
    Schmelter, Daniel
    Rosenbloom, Kate R.
    Raney, Brian J.
    Powell, Conner C.
    Nassar, Luis R.
    Maulding, Nathan D.
    Lee, Christopher M.
    Lee, Brian T.
    Hinrichs, Angie S.
    Fyfe, Alastair C.
    Fernandes, Jason D.
    Diekhans, Mark
    Clawson, Hiram
    Casper, Jonathan
    Benet-Pages, Anna
    Barber, Galt P.
    Haussler, David
    Kuhn, Robert M.
    Haeussler, Maximilian
    Kent, W. James
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D1046 - D1057
  • [6] Hayes Ben, 2013, Methods Mol Biol, V1019, P149, DOI 10.1007/978-1-62703-447-0_6
  • [7] Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding
    Joshi, Trupti
    Fitzpatrick, Michael R.
    Chen, Shiyuan
    Liu, Yang
    Zhang, Hongxin
    Endacott, Ryan Z.
    Gaudiello, Eric C.
    Stacey, Gary
    Nguyen, Henry T.
    Xu, Dong
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D1245 - D1252
  • [8] The advantages and limitations of trait analysis with GWAS: a review
    Korte, Arthur
    Farlow, Ashley
    [J]. PLANT METHODS, 2013, 9
  • [9] Crop genome-wide association study: a harvest of biological relevance
    Liu, Hai-Jun
    Yan, Jianbing
    [J]. PLANT JOURNAL, 2019, 97 (01) : 8 - 18
  • [10] A Pd1-Ps-P1 Feedback Loop Controls Pubescence Density in Soybean
    Liu, Shulin
    Fan, Lei
    Liu, Zhi
    Yang, Xia
    Zhang, Zhifang
    Duan, Zongbiao
    Liang, Qianjin
    Imran, Muhammad
    Zhang, Min
    Tian, Zhixi
    [J]. MOLECULAR PLANT, 2020, 13 (12) : 1768 - 1783