A hierarchical nonparametric Bayesian approach for medical images and gene expressions classification

被引:13
作者
Elguebaly, Tarek [1 ]
Bouguila, Nizar [2 ]
机构
[1] Concordia Univ, Elect & Comp Engn Dept ECE, Montreal, PQ H3G 1T7, Canada
[2] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Mixture models; Gamma distribution; Feature selection; Nonparametric Bayes; MCMC; Gibbs sampling; Mammography; Gene expression; CONTENT-BASED RETRIEVAL; FEATURE-SELECTION; MIXTURE; CANCER; DISTRIBUTIONS; DATABASES; INFERENCE; TISSUE; MODEL;
D O I
10.1007/s00500-014-1242-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lately, the enormous generation of databases in almost every aspect of life has created a great demand for new powerful tools for turning data into useful information. Therefore, researchers were encouraged to explore and develop new machine learning ideas and methods. Mixture models are one of the machine learning techniques receiving considerable attention due to their ability to handle efficiently and effectively multidimensional data. In this paper, we represent a solution for two challenging issues: modeling non-Gaussian data and determining the set of relevant features in the data. The problem of modeling non-Gaussian data largely present in several computer vision, image processing, medical, and Bioinformatics applications is accomplished by the development of a generative infinite Gamma mixture model. The Gamma is chosen for its ability to handle long-tailed distributions, which allows it to have a good approximation to data with outliers. The proposed model, which can be viewed as a Dirichlet process mixture of Gamma distributions, takes into account the feature selection problem by determining a set of relevant features for each data cluster which provides better interpretability and generalization capabilities. We propose then an efficient algorithm to learn this infinite model's parameters by estimating all its posterior quantities of interest using Markov Chain Monte Carlo (MCMC) simulations. Thus, our algorithm is able to perform model selection, parameter learning, and feature selection simultaneously in a single step for the Gamma Mixture model. Furthermore, we show how the model can be used, while comparing it with other popular models in the literature, in two challenging applications namely medical images and gene expressions classification.
引用
收藏
页码:189 / 204
页数:16
相关论文
共 69 条
[1]   Image and Video Segmentation by Combining Unsupervised Generalized Gaussian Mixture Modeling and Feature Selection [J].
Allili, Mohand Said ;
Ziou, Djemel ;
Bouguila, Nizar ;
Boutemedjet, Sabri .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (10) :1373-1377
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Content-based retrieval and analysis of mammographic masses [J].
Alto, H ;
Rangayyan, RM ;
Desautels, JEL .
JOURNAL OF ELECTRONIC IMAGING, 2005, 14 (02) :1-17
[4]  
[Anonymous], 1992, Stat. Sci., DOI DOI 10.1214/SS/1177011143
[5]  
[Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
[6]  
[Anonymous], 1998, HDB PATTERN RECOGNIT
[7]   GLOBAL AND LOCAL PRIORS, AND THE LOCATION OF LESIONS USING GAMMA-CAMERA IMAGERY [J].
AYKROYD, RG ;
GREEN, PJ .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 1991, 337 (1647) :323-342
[8]   A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection [J].
Bouguila, Nizar ;
Almakadmeh, Khaled ;
Boutemedjet, Sabri .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (07) :6641-6656
[9]   A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling [J].
Bouguila, Nizar ;
Ziou, Djemel .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (01) :107-122
[10]   A DIRICHLET PROCESS MIXTURE OF DIRICHLET DISTRIBUTIONS FOR CLASSIFICATION AND PREDICTION [J].
Bouguila, Nizar ;
Ziou, Djemel .
2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, :297-+