Comparing Four Genome-Wide Association Study (GWAS) Programs with Varied Input Data Quantity

被引:0
|
作者
Yan, Yan [1 ]
Burbridge, Connor [1 ]
Shi, Jinhong [1 ]
Liu, Juxin [2 ]
Kusalik, Anthony [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
[2] Univ Saskatchewan, Dept Math & Stat, Saskatoon, SK, Canada
来源
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2018年
关键词
Genome-Wide Association Study (GWAS); Arabidopsis thaliana; plant phenomics; plant genomics; PLINK; TASSEL; GAPIT; FaST-LMM; POWER;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Genome-wide association studies (GWAS) have served as primary methods for the past decade for identifying associations between genetic variants and traits or diseases. Many software packages have been developed for GWAS analysis based on different statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. Few studies have been conducted to investigate this effect by comparing the performance of GWAS programs using varied amounts of experimental data, especially in the context of plants and plant genomes. In this paper, we investigate how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM. Both synthetic and real data are used. Standard GWAS output includes single nucleotide polymorphisms (SNPs) and their p-values. To evaluate the programs, p-values and q-values of SNPs, and Kendall rank correlation between output SNP lists, are used. Results show that with the same GWAS program, different Arabidopsis thaliana datasets demonstrate similar trends of rank correlation with varied input quantity, but differentiate on the numbers of SNPs passing a given p- or q-value threshold. In practice, experimental datasets may have samples containing varied numbers of biological replicates. We show that this variation in replicates influences the p-values of SNPs, but does not strongly affect the rank correlation. When comparing synthetic and real data, the output SNPs from synthetic data have similar rank correlation trends across all four GWAS programs, but the same measure from real data is diverse across the programs. In addition, the real data results in a linear-like increase in the numbers of significant SNPs with more input data, but the synthetic data does not follow this trend. This study provides guidance on selecting GWAS programs when varied experimental data is present and on selecting significant SNPs for subsequent study. It contributes to understanding how much input data is necessary to yield satisfying GWAS results.
引用
收藏
页码:1802 / 1809
页数:8
相关论文
共 50 条
  • [31] Genome-Wide Association Study of Plant and Ear Height in Maize
    Shi Lu
    Mu Li
    Mo Zhang
    Ming Lu
    Xinqi Wang
    Piwu Wang
    Wenguo Liu
    Tropical Plant Biology, 2020, 13 : 262 - 273
  • [32] The genetic structure of pain in depression patients: A genome-wide association study and proteome-wide association study
    Zhang, Zhen
    Liu, Li
    Zhang, Huijie
    Li, Chun'e
    Chen, Yujing
    Zhang, Jingxi
    Pan, Chuyu
    Cheng, Shiqiang
    Yang, Xuena
    Meng, Peilin
    Yao, Yao
    Jia, Yumeng
    Wen, Yan
    Zhang, Feng
    JOURNAL OF PSYCHIATRIC RESEARCH, 2022, 156 : 547 - 556
  • [33] A new regulator of seed size control in Arabidopsis identified by a genome-wide association study
    Ren, Diqiu
    Wang, Xuncheng
    Yang, Mei
    Yang, Li
    He, Guangming
    Deng, Xing Wang
    NEW PHYTOLOGIST, 2019, 222 (02) : 895 - 906
  • [34] Genome-Wide Association Study (GWAS) Identifies Key Candidate Genes Associated with Leaf Size in Alfalfa (Medicago sativa L.)
    Xu, Ming
    Jiang, Xueqian
    He, Fei
    Sod, Bilig
    Yang, Tianhui
    Zhang, Fan
    Cong, Lili
    Long, Ruicai
    Li, Mingna
    Wang, Xue
    Yang, Qingchuan
    Zhang, Tiejun
    Kang, Junmei
    AGRICULTURE-BASEL, 2023, 13 (12):
  • [35] Genome-Wide Association Studies in Nephrology: Using Known Associations for Data Checks
    Wuttke, Matthias
    Schaefer, Franz
    Wong, Craig S.
    Koettgen, Anna
    AMERICAN JOURNAL OF KIDNEY DISEASES, 2015, 65 (02) : 217 - 222
  • [36] Statistical challenges for genome-wide association studies of suicidality using family data
    Lasky-Su, J.
    Lange, C.
    EUROPEAN PSYCHIATRY, 2010, 25 (05) : 307 - 309
  • [37] Towards practical privacy-preserving genome-wide association study
    Bonte, Charlotte
    Makri, Eleftheria
    Ardeshirdavani, Amin
    Simm, Jaak
    Moreau, Yves
    Vercauteren, Frederik
    BMC BIOINFORMATICS, 2018, 19
  • [38] Genome-wide association study of childhood acute lymphoblastic leukemia in Korea
    Han, Sohee
    Lee, Kyoung-Mu
    Park, Sue K.
    Lee, Jong Eun
    Ahn, Hyo Seop
    Shin, Hee Young
    Kang, Hyoung Jin
    Koo, Hong Hoe
    Seo, Jong Jin
    Choi, Ji Eun
    Ahn, Yoon-Ok
    Kang, Daehee
    LEUKEMIA RESEARCH, 2010, 34 (10) : 1271 - 1274
  • [39] Preliminary Genome-Wide Association Study of Bipolar Disorder in the Japanese Population
    Hattori, Eiji
    Toyota, Tomoko
    Ishitsuka, Yuichi
    Iwayama, Yoshimi
    Yamada, Kazuo
    Ujike, Hiroshi
    Morita, Yukitaka
    Kodama, Masafumi
    Nakata, Kenji
    Minabe, Yoshio
    Nakamura, Kazuhiko
    Iwata, Yasuhide
    Takei, Nori
    Mori, Norio
    Naitoh, Hiroshi
    Yamanouchi, Yoshio
    Iwata, Nakao
    Ozaki, Norio
    Kato, Tadafumi
    Nishikawa, Toru
    Kashiwa, Atsushi
    Suzuki, Mika
    Shioe, Kunihiko
    Shinohara, Manabu
    Hirano, Masami
    Nanko, Shinichiro
    Akahane, Akihisa
    Ueno, Mikako
    Kaneko, Naoshi
    Watanabe, Yuichiro
    Someya, Toshiyuki
    Hashimoto, Kenji
    Iyo, Masaomi
    Itokawa, Masanari
    Arai, Makoto
    Nankai, Masahiro
    Inada, Toshiya
    Yoshida, Sumiko
    Kunugi, Hiroshi
    Nakamura, Michiko
    Iijima, Yoshimi
    Okazaki, Yuji
    Higuchi, Teruhiko
    Yoshikawa, Takeo
    AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS, 2009, 150B (08) : 1110 - 1117
  • [40] The Generation R study: a candidate gene study and genome-wide association study (GWAS) on health-related quality of life (HRQOL) of mothers and young children
    Hein Raat
    Lenie van Rossem
    Vincent W. V. Jaddoe
    Jeanne M. Landgraf
    David Feeny
    Henriëtte A. Moll
    Albert Hofman
    Johan P. Mackenbach
    Quality of Life Research, 2010, 19 : 1439 - 1446