locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies

被引:5
|
作者
Hahn, Georg [1 ]
Lutz, Sharon M. [1 ]
Hecker, Julian [2 ]
Prokopenko, Dmitry [3 ]
Cho, Michael H. [2 ]
Silverman, Edwin K. [2 ]
Weiss, Scott T. [2 ]
Lange, Christoph [1 ]
机构
[1] Harvard Univ, TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Harvard Univ, Brigham & Womens Hosp, Dept Med, Boston, MA 02115 USA
[3] Harvard Univ, Massachusetts Gen Hosp, Boston, MA USA
关键词
regional analysis; population stratification; population substructure; similarity matrix; whole-genome sequencing; GENETIC ASSOCIATION ANALYSIS; LOCAL-ANCESTRY; RARE VARIANTS; INFERENCE; LINKAGE;
D O I
10.1002/gepi.22356
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
locStrais anR-package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.
引用
收藏
页码:82 / 98
页数:17
相关论文
共 50 条
  • [1] Local and Global Stratification Analysis in Whole Genome Sequencing (WGS) Studies Using LocStra
    Hahn, Georg
    Lutz, Sharon Marie
    Hecker, Julian
    Prokopenko, Dmitry
    Lange, Christoph
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, 12029 LNBI : 159 - 170
  • [2] Whole-Genome Sequencing in Outbreak Analysis
    Gilchrist, Carol A.
    Turner, Stephen D.
    Riley, Margaret F.
    Petri, William A., Jr.
    Hewlett, Erik L.
    CLINICAL MICROBIOLOGY REVIEWS, 2015, 28 (03) : 541 - 563
  • [3] Whole-genome sequencing
    Morris, Huw R.
    Houlden, Henry
    Polke, James
    PRACTICAL NEUROLOGY, 2021, 21 (04) : 322 - +
  • [4] Longitudinal Data Analysis for Genetic Studies in the Whole-Genome Sequencing Era
    Wu, Zheyang
    Hu, Yijuan
    Melton, Phillip E.
    GENETIC EPIDEMIOLOGY, 2014, 38 : S74 - S80
  • [5] Fitting whole-genome sequencing analysis for metastasis
    Julia Simundza
    Nature Cancer, 2021, 2 : 1290 - 1290
  • [6] Fitting whole-genome sequencing analysis for metastasis
    Simundza, Julia
    NATURE CANCER, 2021, 2 (12) : 1290 - 1290
  • [7] Whole-genome sequencing analysis of the cardiometabolic proteome
    Gilly, Arthur
    Park, Young-Chan
    Png, Grace
    Barysenka, Andrei
    Fischer, Iris
    Bjornland, Thea
    Southam, Lorraine
    Suveges, Daniel
    Neumeyer, Sonja
    Rayner, N. William
    Tsafantakis, Emmanouil
    Karaleftheri, Maria
    Dedoussis, George
    Zeggini, Eleftheria
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [8] Whole-genome sequencing analysis of the cardiometabolic proteome
    Arthur Gilly
    Young-Chan Park
    Grace Png
    Andrei Barysenka
    Iris Fischer
    Thea Bjørnland
    Lorraine Southam
    Daniel Suveges
    Sonja Neumeyer
    N. William Rayner
    Emmanouil Tsafantakis
    Maria Karaleftheri
    George Dedoussis
    Eleftheria Zeggini
    Nature Communications, 11
  • [9] Whole-genome sequencing analysis of brain tumors
    Suzuki, Hiromichi
    Nakashima, Takuma
    Funakoshi, Yusuke
    Kanamori, Masayuki
    Shibahara, Ichiyo
    Suzuki, Tomonari
    Kinoshita, Manabu
    Sonoda, Yukihiko
    Arakawa, Yoshiki
    Nagane, Motoo
    Tanaka, Shota
    Ishida, Joji
    Saito, Ryuta
    Hanaya, Ryosuke
    Yoshimoto, Koji
    Narita, Yoshitaka
    CANCER SCIENCE, 2025, 116 : 626 - 626
  • [10] Indexcov: fast coverage quality control for whole-genome sequencing
    Pedersen, Brent S.
    Collins, Ryan L.
    Talkowski, Michael E.
    Quinlan, Aaron R.
    GIGASCIENCE, 2017, 6 (11):