High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis

被引:0
作者
Nie, Jinyu [1 ,2 ]
Qin, Zhilong [3 ]
Liu, Wei [4 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[3] Southwestern Univ Finance & Econ, Inst Western China Econ Res, Chengdu, Peoples R China
[4] Sichuan Univ, Sch Math, Chengdu, Peoples R China
关键词
generalized factor model; high dimension; mixed-type data; overdispersion; variational EM; MAXIMUM-LIKELIHOOD; INFERENCE; NUMBER;
D O I
10.1002/sim.10213
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
引用
收藏
页码:4836 / 4849
页数:14
相关论文
共 50 条
  • [1] High-dimensional covariate-augmented overdispersed poisson factor model
    Liu, Wei
    Zhong, Qingzhi
    BIOMETRICS, 2024, 80 (02)
  • [2] A variable clustering approach for overdispersed high-dimensional count data using a copula-based mixture model
    Brini, Alberto
    Manju, Abu
    van den Heuvel, Edwin R.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [3] High-dimensional generalized semiparametric model for longitudinal data
    Taavoni, M.
    Arashi, M.
    STATISTICS, 2021, 55 (04) : 831 - 850
  • [4] Generalized Nonparametric Composite Tests for High-Dimensional Data
    Kong, Xiaoli
    Villasante-Tezanos, Alejandro
    Harrar, Solomon W.
    SYMMETRY-BASEL, 2022, 14 (06):
  • [5] A latent factor linear mixed model for high-dimensional longitudinal data analysis
    An, Xinming
    Yang, Qing
    Bentler, Peter M.
    STATISTICS IN MEDICINE, 2013, 32 (24) : 4229 - 4239
  • [6] A sparse factor model for clustering high-dimensional longitudinal data
    Lu, Zihang
    Chandra, Noirrit Kiran
    STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
  • [7] Statistical methods for analysis of single-cell RNA-sequencing data
    Das, Samarendra
    Rai, Shesh N.
    METHODSX, 2021, 8
  • [8] A FLEXIBLE MODEL FOR CORRELATED COUNT DATA, WITH APPLICATION TO MULTICONDITION DIFFERENTIAL EXPRESSION ANALYSES OF SINGLE-CELL RNA SEQUENCING DATA
    Liu, Yusha
    Carbonetto, Peter
    Takahama, Michihiro
    Gruenbaum, Adam
    Xie, Dongyue
    Chevrier, Nicolas
    Stephens, Matthew
    ANNALS OF APPLIED STATISTICS, 2024, 18 (03) : 2551 - 2575
  • [9] Factor Modeling for High-Dimensional Interval-Valued Data
    Guo, Yan
    Zou, Guchu
    Wu, Jianhong
    STUDIES IN NONLINEAR DYNAMICS AND ECONOMETRICS, 2025,
  • [10] Online missing value imputation for high-dimensional mixed-type data via generalized factor models
    Liu, Wei
    Luo, Lan
    Zhou, Ling
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 187