Gene and pathway identification with Lp penalized Bayesian logistic regression

被引:12
|
作者
Liu, Zhenqiu [1 ]
Gartenhaus, Ronald B. [2 ,3 ]
Tan, Ming [1 ]
Jiang, Feng [4 ]
Jiao, Xiaoli [1 ]
机构
[1] Univ Maryland, Greenebaum Canc Ctr, Div Biostat, Baltimore, MD 21201 USA
[2] Univ Maryland, Sch Med, Dept Med, Baltimore, MD 21201 USA
[3] Univ Maryland, Sch Med, Greenebaum Canc Ctr, Baltimore, MD 21201 USA
[4] Univ Maryland, Sch Med, Dept Pathol, Baltimore, MD 21201 USA
关键词
D O I
10.1186/1471-2105-9-412
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with L-p ( p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming. Results: In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/. Conclusion: Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Gene and pathway identification with Lppenalized Bayesian logistic regression
    Zhenqiu Liu
    Ronald B Gartenhaus
    Ming Tan
    Feng Jiang
    Xiaoli Jiao
    BMC Bioinformatics, 9
  • [2] Classification of gene microarrays by penalized logistic regression
    Zhu, J
    Hastie, T
    BIOSTATISTICS, 2004, 5 (03) : 427 - 443
  • [3] Penalized logistic regression for detecting gene interactions
    Park, Mee Young
    Hastie, Trevor
    BIOSTATISTICS, 2008, 9 (01) : 30 - 50
  • [4] Approximate Bayesian logistic regression via penalized likelihood by data augmentation
    Discacciati, Andrea
    Orsini, Nicola
    Greenland, Sander
    Stata Journal, 2015, 15 (03): : 712 - 736
  • [5] Sparse logistic regression with Lp penalty for biomarker identification
    Liu, Zhenqiu
    Jiang, Feng
    Tian, Guoliang
    Wang, Suna
    Sato, Fumiaki
    Meltzer, Stephen J.
    Tan, Ming
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2007, 6
  • [6] Bayesian multinomial logistic regression for author identification
    Madigan, D
    Genkin, A
    Lewis, DD
    Fradkin, D
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, 2005, 803 : 509 - 516
  • [7] Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
    Liu, Cheng
    Wong, Hau San
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 312 - 321
  • [8] Identification of Grouped Rare and Common Variants via Penalized Logistic Regression
    Ayers, Kristin L.
    Cordell, Heather J.
    GENETIC EPIDEMIOLOGY, 2013, 37 (06) : 592 - 602
  • [9] Identification of Grouped Rare and Common Variants via Penalized Logistic Regression
    Ayers, Kristin L.
    Cordell, Heather J.
    GENETIC EPIDEMIOLOGY, 2012, 36 (07) : 730 - 731
  • [10] Penalized logistic regression with prior information for microarray gene expression classification
    Genc, Murat
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2024, 20 (01): : 107 - 122