Comparing Four Genome-Wide Association Study (GWAS) Programs with Varied Input Data Quantity

被引:0
|
作者
Yan, Yan [1 ]
Burbridge, Connor [1 ]
Shi, Jinhong [1 ]
Liu, Juxin [2 ]
Kusalik, Anthony [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
[2] Univ Saskatchewan, Dept Math & Stat, Saskatoon, SK, Canada
来源
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2018年
关键词
Genome-Wide Association Study (GWAS); Arabidopsis thaliana; plant phenomics; plant genomics; PLINK; TASSEL; GAPIT; FaST-LMM; POWER;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Genome-wide association studies (GWAS) have served as primary methods for the past decade for identifying associations between genetic variants and traits or diseases. Many software packages have been developed for GWAS analysis based on different statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. Few studies have been conducted to investigate this effect by comparing the performance of GWAS programs using varied amounts of experimental data, especially in the context of plants and plant genomes. In this paper, we investigate how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM. Both synthetic and real data are used. Standard GWAS output includes single nucleotide polymorphisms (SNPs) and their p-values. To evaluate the programs, p-values and q-values of SNPs, and Kendall rank correlation between output SNP lists, are used. Results show that with the same GWAS program, different Arabidopsis thaliana datasets demonstrate similar trends of rank correlation with varied input quantity, but differentiate on the numbers of SNPs passing a given p- or q-value threshold. In practice, experimental datasets may have samples containing varied numbers of biological replicates. We show that this variation in replicates influences the p-values of SNPs, but does not strongly affect the rank correlation. When comparing synthetic and real data, the output SNPs from synthetic data have similar rank correlation trends across all four GWAS programs, but the same measure from real data is diverse across the programs. In addition, the real data results in a linear-like increase in the numbers of significant SNPs with more input data, but the synthetic data does not follow this trend. This study provides guidance on selecting GWAS programs when varied experimental data is present and on selecting significant SNPs for subsequent study. It contributes to understanding how much input data is necessary to yield satisfying GWAS results.
引用
收藏
页码:1802 / 1809
页数:8
相关论文
共 50 条
  • [1] Effects of input data quantity on genome-wide association studies (GWAS)
    Yan, Yan
    Burbridge, Connor
    Shi, Jinhong
    Liu, Juxin
    Kusalik, Anthony
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2019, 22 (01) : 19 - 43
  • [2] A Genome-Wide Association Study (GWAS) for Bronchopulmonary Dysplasia
    Wang, Hui
    St Julien, Krystal R.
    Stevenson, David K.
    Hoffmann, Thomas J.
    Witte, John S.
    Lazzeroni, Laura C.
    Krasnow, Mark A.
    Quaintance, Cecele C.
    Oehlert, John W.
    Jelliffe-Pawlowski, Laura L.
    Gould, Jeffrey B.
    Shaw, Gary M.
    O'Brodovich, Hugh M.
    PEDIATRICS, 2013, 132 (02) : 290 - 297
  • [3] Genome-wide association study (GWAS) of leaf wax components of apple
    Cao, Fuguo
    Li, Zhongxing
    Jiang, Lijuan
    Liu, Chen
    Qian, Qian
    Yang, Feng
    Ma, Fengwang
    Guan, Qingmei
    STRESS BIOLOGY, 2021, 1 (01):
  • [4] Genome-Wide Association Study (GWAS) of Mesocotyl Length for Direct Seeding in Rice
    Jang, Seong-Gyu
    Park, So-Yeon
    Lar, San Mar
    Zhang, Hongjia
    Lee, Ah-Rim
    Cao, Fang-Yuan
    Seo, Jeonghwan
    Ham, Tae-Ho
    Lee, Joohyun
    Kwon, Soon-Wook
    AGRONOMY-BASEL, 2021, 11 (12):
  • [5] Genome-Wide Association Mapping With Longitudinal Data
    Furlotte, Nicholas A.
    Eskin, Eleazar
    Eyheramendy, Susana
    GENETIC EPIDEMIOLOGY, 2012, 36 (05) : 463 - 471
  • [6] Independent and Joint-GWAS for growth traits in Eucalyptus by assembling genome-wide data for 3373 individuals across four breeding populations
    Mueller, Barbara S. F.
    de Almeida Filho, Janeo E.
    Lima, Bruno M.
    Garcia, Carla C.
    Missiaggia, Alexandre
    Aguiar, Aurelio M.
    Takahashi, Elizabete
    Kirst, Matias
    Gezan, Salvador A.
    Silva-Junior, Orzenil B.
    Neves, Leandro G.
    Grattapaglia, Dario
    NEW PHYTOLOGIST, 2019, 221 (02) : 818 - 833
  • [7] Genome-wide association study (GWAS) reveals genetic basis of ear-related traits in maize
    Yang, Lin
    Li, Ting
    Tian, Xiaokang
    Yang, Bingpeng
    Lao, Yonghui
    Wang, Yahui
    Zhang, Xinghua
    Xue, Jiquan
    Xu, Shutu
    EUPHYTICA, 2020, 216 (11)
  • [8] Analysis of genome-wide association study data using the protein knowledge base
    Ballouz, Sara
    Liu, Jason Y.
    Oti, Martin
    Gaeta, Bruno
    Fatkin, Diane
    Bahlo, Melanie
    Wouters, Merridee A.
    BMC GENETICS, 2011, 12
  • [9] A genome-wide association study of malting quality across eight US barley breeding programs
    Mohammadi, Mohsen
    Blake, Thomas K.
    Budde, Allen D.
    Chao, Shiaoman
    Hayes, Patrick M.
    Horsley, Richard D.
    Obert, Donald E.
    Ullrich, Steven E.
    Smith, Kevin P.
    THEORETICAL AND APPLIED GENETICS, 2015, 128 (04) : 705 - 721
  • [10] Replication of a genome-wide association study of panic disorder in a Japanese population
    Otowa, Takeshi
    Tanii, Hisashi
    Sugaya, Nagisa
    Yoshida, Eiji
    Inoue, Ken
    Yasuda, Shin
    Shimada, Takafumi
    Kawamura, Yoshiya
    Tochigi, Mamoru
    Minato, Takanobu
    Umekage, Tadashi
    Miyagawa, Taku
    Nishida, Nao
    Tokunaga, Katsushi
    Okazaki, Yuji
    Kaiya, Hisanobu
    Sasaki, Tsukasa
    JOURNAL OF HUMAN GENETICS, 2010, 55 (02) : 91 - 96