High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis

被引:0
作者
Nie, Jinyu [1 ,2 ]
Qin, Zhilong [3 ]
Liu, Wei [4 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[3] Southwestern Univ Finance & Econ, Inst Western China Econ Res, Chengdu, Peoples R China
[4] Sichuan Univ, Sch Math, Chengdu, Peoples R China
关键词
generalized factor model; high dimension; mixed-type data; overdispersion; variational EM; MAXIMUM-LIKELIHOOD; INFERENCE; NUMBER;
D O I
10.1002/sim.10213
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
引用
收藏
页码:4836 / 4849
页数:14
相关论文
共 50 条
  • [41] Small sample sizes: A big data problem in high-dimensional data analysis
    Konietschke, Frank
    Schwab, Karima
    Pauly, Markus
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (03) : 687 - 701
  • [42] High-dimensional disjoint factor analysis with its EM algorithm version
    Cai, Jingyu
    Adachi, Kohei
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2021, 4 (01) : 427 - 448
  • [43] A Supervised Learning Model for High-Dimensional and Large-Scale Data
    Peng, Chong
    Cheng, Jie
    Cheng, Qiang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (02)
  • [44] Comprehensive analysis of juvenile idiopathic arthritis patients' immune characteristics based on bulk and single-cell sequencing data
    Liu, Mubo
    Gong, Yadong
    Lin, Mu
    Ma, Qingqing
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2024, 11
  • [45] Generalized Factor Model for Ultra-High Dimensional Correlated Variables with Mixed Types
    Liu, Wei
    Lin, Huazhen
    Zheng, Shurong
    Liu, Jin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) : 1385 - 1401
  • [46] A Note on the Likelihood Ratio Test in High-Dimensional Exploratory Factor Analysis
    He, Yinqiu
    Wang, Zi
    Xu, Gongjun
    PSYCHOMETRIKA, 2021, 86 (02) : 442 - 463
  • [47] Detecting high-dimensional determinism in time series with application to human movement data
    Ramdani, Sofiane
    Bouchara, Frederic
    Caron, Olivier
    NONLINEAR ANALYSIS-REAL WORLD APPLICATIONS, 2012, 13 (04) : 1891 - 1903
  • [48] A high-dimensional test on linear hypothesis of means under a low-dimensional factor model
    Cao, Mingxiang
    He, Yuanjing
    METRIKA, 2022, 85 (05) : 557 - 572
  • [49] PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data
    Franzen, Oscar
    Gan, Li-Ming
    Bjorkegren, Johan L. M.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
  • [50] Leverage, Asymmetry, and Heavy Tails in the High-Dimensional Factor Stochastic Volatility Model
    Li, Mengheng
    Scharth, Marcel
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2022, 40 (01) : 285 - 301