An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification

被引:14
作者
Esfahani, Mohammad Shahrokh [1 ,2 ]
Dougherty, Edward R. [1 ,2 ]
机构
[1] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
[2] Texas A&M Univ, Ctr Bioinformat & Genom Syst Engn, College Stn, TX 77843 USA
关键词
Phenotype classification; biological pathways; prior probability construction; optimal Bayesian classifier; regularized expected mean log-likelihood; GENE REGULATORY NETWORKS; SQUARE ERROR ESTIMATION; CANCER GENOMICS DATA; NONPARAMETRIC PROBLEMS; MAXIMUM-ENTROPY; UNCERTAINTY; INFERENCE; DISTRIBUTIONS; MODEL; PRINCIPLE;
D O I
10.1109/TCBB.2015.2424407
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
引用
收藏
页码:1304 / 1321
页数:18
相关论文
共 51 条
[1]   Fault Diagnosis Engineering of Digital Circuits Can Identify Vulnerable Molecules in Complex Cellular Pathways [J].
Abdi, Ali ;
Tahoori, Mehdi Baradaran ;
Emamian, Effat S. .
SCIENCE SIGNALING, 2008, 1 (42) :ra10
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]  
[Anonymous], 2007, BIOL CANC
[4]  
[Anonymous], 2006, PATTERN RECOGN
[5]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[6]   The ups and downs of p53: understanding protein dynamics in single cells [J].
Batchelor, Eric ;
Loewer, Alexander ;
Lahav, Galit .
NATURE REVIEWS CANCER, 2009, 9 (05) :371-377
[7]  
Berger J.O., 1992, BAYESIAN STAT, V4, P35
[8]  
Bernard A, 2005, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, P459
[9]  
BERNARDO JM, 1979, J R STAT SOC B, V41, P113
[10]   Regularization in statistics [J].
Bickel, Peter J. ;
Li, Bo .
TEST, 2006, 15 (02) :271-303