A reference haplotype panel for genome-wide imputation of short tandem repeats

被引:46
|
作者
Saini, Shubham [1 ]
Mitra, Ileena [2 ]
Mousavi, Nima [3 ]
Fotsing, Stephanie Feupe [2 ,4 ]
Gymrek, Melissa [1 ,5 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, 9500 Gilman Dr, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Bioinformat & Syst Biol Program, 9500 Gilman Dr, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Dept Elect & Comp Engn, 9500 Gilman Dr, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Dept Biomed Informat, 9500 Gilman Dr, La Jolla, CA 92093 USA
[5] Univ Calif San Diego, Dept Med, 9500 Gilman Dr, La Jolla, CA 92093 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
GENE-EXPRESSION VARIATION; LINKAGE DISEQUILIBRIUM; DNA METHYLATION; CAG REPEAT; EXPANSION; MICROSATELLITE; VARIANTS; MUTATION; DISEASE; ASSOCIATION;
D O I
10.1038/s41467-018-06694-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Short tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in complex traits. However, genotyping arrays used in genome-wide association studies focus on single nucleotide polymorphisms (SNPs) and do not readily allow identification of STR associations. We leverage next-generation sequencing (NGS) from 479 families to create a SNP + STR reference haplotype panel. Our panel enables imputing STR genotypes into SNP array data when NGS is not available for directly genotyping STRs. Imputed genotypes achieve mean concordance of 97% with observed genotypes in an external dataset compared to 71% expected under a naive model. Performance varies widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic repeats. Imputation increases power over individual SNPs to detect STR associations with gene expression. Imputing STRs into existing SNP datasets will enable the first large-scale STR association studies across a range of complex traits.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Genome-wide detection of short tandem repeat expansions by long-read sequencing
    Liu, Qian
    Tong, Yao
    Wang, Kai
    BMC BIOINFORMATICS, 2020, 21 (Suppl 21)
  • [22] Genome-wide detection of short tandem repeat expansions by long-read sequencing
    Qian Liu
    Yao Tong
    Kai Wang
    BMC Bioinformatics, 21
  • [23] Accuracy of genome-wide imputation in Braford and Hereford beef cattle
    Piccoli, Mario L.
    Braccini, Jose
    Cardoso, Fernando F.
    Sargolzaei, Medhi
    Larmer, Steven G.
    Schenkel, Flavio S.
    BMC GENETICS, 2014, 15
  • [24] Genome-wide characterization of simple sequence repeats in Palmae genomes
    Manee, Manee M.
    Al-Shomrani, Badr M.
    Al-Fageeh, Mohamed B.
    GENES & GENOMICS, 2020, 42 (05) : 597 - 608
  • [25] Genome-wide characterization of simple sequence repeats in Palmae genomes
    Manee M. Manee
    Badr M. Al-Shomrani
    Mohamed B. Al-Fageeh
    Genes & Genomics, 2020, 42 : 597 - 608
  • [26] HapBoost: A Fast Approach to Boosting Haplotype Association Analyses in Genome-Wide Association Studies
    Wan, Xiang
    Yang, Can
    Yang, Qiang
    Zhao, Hongyu
    Yu, Weichuan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (01) : 207 - 212
  • [27] The Impact of Imputation on Meta-Analysis of Genome-Wide Association Studies
    Li, Jian
    Guo, Yan-fang
    Pei, Yufang
    Deng, Hong-Wen
    PLOS ONE, 2012, 7 (04):
  • [28] Inverted Low-Copy Repeats and Genome Instability-A Genome-Wide Analysis
    Dittwald, Piotr
    Gambin, Tomasz
    Gonzaga-Jauregui, Claudia
    Carvalho, Claudia M. B.
    Lupski, James R.
    Stankiewicz, Pawel
    Gambin, Anna
    HUMAN MUTATION, 2013, 34 (01) : 210 - 220
  • [29] STaRRRT: a table of short tandem repeats in regulatory regions of the human genome
    Bolton, Katherine A.
    Ross, Jason P.
    Grice, Desma M.
    Bowden, Nikola A.
    Holliday, Elizabeth G.
    Avery-Kiejda, Kelly A.
    Scott, Rodney J.
    BMC GENOMICS, 2013, 14
  • [30] A short review on Genome-Wide Association Studies
    Cao, Xiaowen
    Xing, Li
    He, Hua
    Zhang, Xuekui
    BIOINFORMATION, 2020, 16 (05) : 393 - 395