A hierarchical nonparametric Bayesian approach for medical images and gene expressions classification

被引:13
作者
Elguebaly, Tarek [1 ]
Bouguila, Nizar [2 ]
机构
[1] Concordia Univ, Elect & Comp Engn Dept ECE, Montreal, PQ H3G 1T7, Canada
[2] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Mixture models; Gamma distribution; Feature selection; Nonparametric Bayes; MCMC; Gibbs sampling; Mammography; Gene expression; CONTENT-BASED RETRIEVAL; FEATURE-SELECTION; MIXTURE; CANCER; DISTRIBUTIONS; DATABASES; INFERENCE; TISSUE; MODEL;
D O I
10.1007/s00500-014-1242-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lately, the enormous generation of databases in almost every aspect of life has created a great demand for new powerful tools for turning data into useful information. Therefore, researchers were encouraged to explore and develop new machine learning ideas and methods. Mixture models are one of the machine learning techniques receiving considerable attention due to their ability to handle efficiently and effectively multidimensional data. In this paper, we represent a solution for two challenging issues: modeling non-Gaussian data and determining the set of relevant features in the data. The problem of modeling non-Gaussian data largely present in several computer vision, image processing, medical, and Bioinformatics applications is accomplished by the development of a generative infinite Gamma mixture model. The Gamma is chosen for its ability to handle long-tailed distributions, which allows it to have a good approximation to data with outliers. The proposed model, which can be viewed as a Dirichlet process mixture of Gamma distributions, takes into account the feature selection problem by determining a set of relevant features for each data cluster which provides better interpretability and generalization capabilities. We propose then an efficient algorithm to learn this infinite model's parameters by estimating all its posterior quantities of interest using Markov Chain Monte Carlo (MCMC) simulations. Thus, our algorithm is able to perform model selection, parameter learning, and feature selection simultaneously in a single step for the Gamma Mixture model. Furthermore, we show how the model can be used, while comparing it with other popular models in the literature, in two challenging applications namely medical images and gene expressions classification.
引用
收藏
页码:189 / 204
页数:16
相关论文
共 69 条
[21]  
Forstner W., 1994, Computer Vision - ECCV '94. Third European Conference on Computer Vision. Proceedings. Vol.II, P383, DOI 10.1007/BFb0028370
[22]  
Ghosh JK., 2006, INTRO BAYESIAN ANAL
[23]  
Giger M.L., 2003, P SPIE MED IM IM PRO, P183
[24]  
GILKS WR, 1993, J R STAT SOC B, V55, P39
[25]  
GLAD IK, 1995, BIOMETRIKA, V82, P237
[26]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[27]   TEXTURAL FEATURES FOR IMAGE CLASSIFICATION [J].
HARALICK, RM ;
SHANMUGAM, K ;
DINSTEIN, I .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (06) :610-621
[28]  
Hastie T., 2001, ELEMENTS STAT LEARNI
[29]   Model-based subspace clustering [J].
Hoff, Peter D. .
BAYESIAN ANALYSIS, 2006, 1 (02) :321-344
[30]   A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses [J].
Huang, YH ;
Englehart, KB ;
Hudgins, B ;
Chan, ADC .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2005, 52 (11) :1801-1811