Fast and accurate out-of-core PCA framework for large scale biobank data

被引:4
|
作者
Li, Zilong [1 ]
Meisner, Jonas [2 ,3 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Dept Biol, Sect Computat & RNA Biol, DK-2200 Copenhagen, Denmark
[2] Copenhagen Univ Hosp, Mental Hlth Ctr Copenhagen, Biol & Precis Psychiat, DK-2100 Copenhagen, Denmark
[3] Univ Copenhagen, Novo Nord Fdn Ctr Prot Res, DK-2200 Copenhagen, Denmark
关键词
ALGORITHM; GENOME;
D O I
10.1101/gr.277525.122
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Principal component analysis (PCA) is widely used in statistics, machine learning, and genomics for dimensionality reduction and uncovering low-dimensional latent structure. To address the challenges posed by ever-growing data size, fast and memory-efficient PCA methods have gained prominence. In this paper, we propose a novel randomized singular value decomposition (RSVD) algorithm implemented in PCAone, featuring a window-based optimization scheme that enables accelerated convergence while improving the accuracy. Additionally, PCAone incorporates out-of-core and multithreaded implementations for the existing Implicitly Restarted Arnoldi Method (IRAM) and RSVD. Through comprehensive evaluations using multiple large-scale real-world data sets in different fields, we show the advantage of PCAone over existing methods. The new algorithm achieves significantly faster computation time while maintaining accuracy comparable to the slower IRAM method. Notably, our analyses of UK Biobank, comprising around 0.5 million individuals and 6.1 million common single nucleotide polymorphisms, show that PCAone accurately computes the top 40 principal components within 9 h. This analysis effectively captures population structure, signals of selection, structural variants, and low recombination regions, utilizing <20 GB of memory and 20 CPU threads. Furthermore, when applied to single-cell RNA sequencing data featuring 1.3 million cells, PCAone, accurately capturing the top 40 principal components in 49 min. This performance represents a 10-fold improvement over state-of-the-art tools.
引用
收藏
页码:1599 / 1608
页数:10
相关论文
共 50 条
  • [21] An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems
    Reid, J. K.
    Scott, J. A.
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2009, 77 (07) : 901 - 921
  • [22] A Framework to Transform In-Core GPU Algorithms to Out-of-Core Algorithms
    Harada, Takahiro
    PROCEEDINGS I3D 2016: 20TH ACM SIGGRAPH SYMPOSIUM ON INTERACTIVE 3D GRAPHICS AND GAMES, 2016, : 179 - 180
  • [23] Out-of-core bundle adjustment for large-scale 3D reconstruction
    Ni, Kai
    Steedly, Drew
    Dellaert, Frank
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 2009 - 2016
  • [24] Out-of-core streamline visualization on large unstructured meshes
    Ueng, SK
    Sikorski, C
    Ma, KL
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 1997, 3 (04) : 370 - 380
  • [25] Adaptive out-of-core simplification of large point clouds
    Du, Xiaohui
    Yin, Baocai
    Kong, Dehui
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 1439 - 1442
  • [26] GO: Out-Of-Core Partitioning of Large Irregular Graphs
    Kaur, Gurneet
    Gupta, Rajiv
    2021 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2021, : 9 - 18
  • [27] Fast Out-of-Core Octree Generation for Massive Point Clouds
    Schuetz, Markus
    Ohrhallinger, Stefan
    Wimmer, Michael
    COMPUTER GRAPHICS FORUM, 2020, 39 (07) : 155 - 167
  • [28] Fast and exact out-of-core K-means clustering
    Goswami, A
    Jin, RM
    Agrawal, G
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 83 - 90
  • [29] Victream: Computing Framework for Out-of-Core Processing on Multiple GPUs
    Suzuki, Jun
    Hayashi, Yuki
    Kan, Masaki
    Miyakawa, Shinya
    Takenaka, Takashi
    Araki, Takuya
    Kitsuregawa, Masaru
    BDCAT'17: PROCEEDINGS OF THE FOURTH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2017, : 179 - 188
  • [30] Efficient GPU out-of-core visualization of large-scale CAD models with voxel representations
    Xue, Junjie
    Zhao, Gang
    Xiao, Wenlei
    ADVANCES IN ENGINEERING SOFTWARE, 2016, 99 : 73 - 80