Multilevel regularized regression for simultaneous taxa selection and network construction with metagenomic count data

被引:11
|
作者
Liu, Zhenqiu [1 ]
Sun, Fengzhu [2 ]
Braun, Jonathan [3 ]
McGovern, Dermot P. B. [4 ]
Piantadosi, Steven [1 ]
机构
[1] Cedars Sinai Med Ctr, Samuel Oschin Comprehens Canc Inst, Los Angeles, CA 90048 USA
[2] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[3] Univ Calif Los Angeles, David Geffen Sch Med, Dept Pathol & Lab Med, Los Angeles, CA 90095 USA
[4] F Widjaja Fdn Inflammatory Bowel & Immunobiol Res, Cedars Sinai Med Ctr, Los Angeles, CA 90048 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
LOCAL SIMILARITY ANALYSIS; VARIABLE SELECTION; MODEL; DISEASE;
D O I
10.1093/bioinformatics/btu778
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identifying disease associated taxa and constructing networks for bacteria interactions are two important tasks usually studied separately. In reality, differentiation of disease associated taxa and correlation among taxa may affect each other. One genus can be differentiated because it is highly correlated with another highly differentiated one. In addition, network structures may vary under different clinical conditions. Permutation tests are commonly used to detect differences between networks in distinct phenotypes, and they are time-consuming. Results: In this manuscript, we propose a multilevel regularized regression method to simultaneously identify taxa and construct networks. We also extend the framework to allow construction of a common network and differentiated network together. An efficient algorithm with dual formulation is developed to deal with the large-scale n << m problem with a large number of taxa (m) and a small number of samples (n) efficiently. The proposed method is regularized with a general L-p (p is an element of [0, 2]) penalty and models the effects of taxa abundance differentiation and correlation jointly. We demonstrate that it can identify both true and biologically significant genera and network structures. Availability and implementation: Software MLRR in MATLAB is available at http://biostatistics.csmc.edu/mlrr/.
引用
收藏
页码:1067 / 1074
页数:8
相关论文
共 16 条
  • [1] Nonparametric Regularized Regression for Phenotype-Associated Taxa Selection and Network Construction with Metagenomic Count Data
    Guo, Wenchuan
    Liu, Zhenqiu
    Ma, Shujie
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2016, 23 (11) : 877 - 890
  • [2] Network construction and structure detection with metagenomic count data
    Liu, Zhenqiu
    Lin, Shili
    Piantadosi, Steven
    BIODATA MINING, 2015, 8
  • [3] Simultaneous variable selection and estimation in semiparametric regression of mixed panel count data
    Ge, Lei
    Hu, Tao
    Li, Yang
    BIOMETRICS, 2024, 80 (01)
  • [4] A connected network-regularized logistic regression model for feature selection
    Li, Lingyu
    Liu, Zhi-Ping
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11672 - 11702
  • [5] Measures of clustering and heterogeneity in multilevel Poisson regression analyses of rates/count data
    Austin, Peter C.
    Stryhn, Henrik
    Leckie, George
    Merlo, Juan
    STATISTICS IN MEDICINE, 2018, 37 (04) : 572 - 589
  • [6] A novel wavelength interval selection based on split regularized regression for spectroscopic data
    Huang, Xin
    Xia, Li
    JOURNAL OF MATHEMATICAL CHEMISTRY, 2023, 61 (04) : 877 - 892
  • [7] A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
    Hinton, Andrew L.
    Mucha, Peter J.
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [8] NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA
    Sun, Hokeun
    Lin, Wei
    Feng, Rui
    Li, Hongzhe
    STATISTICA SINICA, 2014, 24 (03) : 1433 - 1459
  • [9] Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data
    Zhang, Yongxia
    Wang, Qi
    Tian, Maozai
    MATHEMATICS, 2022, 10 (16)
  • [10] Simultaneous variable selection and estimation for multivariate multilevel longitudinal data with both continuous and binary responses
    Li, Haocheng
    Shu, Di
    Zhang, Yukun
    Yi, Grace Y.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 118 : 126 - 137