Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features

被引:15
作者
Vignes, Matthieu [1 ]
Forbes, Florence [2 ]
机构
[1] Scottish Crop Res Inst, Dundee DD2 5DA, Scotland
[2] INRIA Rhone Alpes, F-38334 Saint Ismier, France
关键词
Markov random fields; model-based clustering; metabolic networks; gene expression; FALSE DISCOVERY RATE; EXPRESSION; NETWORK;
D O I
10.1109/TCBB.2007.70248
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach. Availability: The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.
引用
收藏
页码:260 / 270
页数:11
相关论文
共 27 条
[1]  
[Anonymous], ADV NEURAL INFORM PR
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]  
BOUVEYRON C, 2006, CLASS SPECIFIC SUBSP, P139
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[8]   EM procedures using mean field-like approximations for Markov model-based image segmentation [J].
Celeux, G ;
Forbes, F ;
Peyrard, N .
PATTERN RECOGNITION, 2003, 36 (01) :131-144
[9]  
Celeux G., 1985, Comput Stat Q, V2, P73
[10]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705