Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: A case study

被引:95
作者
Gaujoux, Renaud [2 ]
Seoighe, Cathal [1 ]
机构
[1] Natl Univ Ireland Galway, Sch Math Stat & Appl Math, Galway, Ireland
[2] Univ Cape Town, Computat Biol Grp, Inst Infect Dis & Mol Med, ZA-7700 Rondebosch, South Africa
关键词
NMF; Microarray; Gene expression; Deconvolution; Sample heterogeneity; MICROARRAY DATA; DISCOVERY; PATTERNS; PACKAGE; MODEL;
D O I
10.1016/j.meegid.2011.08.014
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:913 / 921
页数:9
相关论文
共 33 条
[1]   Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus [J].
Abbas, Alexander R. ;
Wolslegel, Kristen ;
Seshasayee, Dhaya ;
Modrusan, Zora ;
Clark, Hilary F. .
PLOS ONE, 2009, 4 (07)
[2]   Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data [J].
Abbas, AR ;
Baldwin, D ;
Ma, Y ;
Ouyang, W ;
Gurney, A ;
Martin, F ;
Fong, S ;
Campagne, MV ;
Godowski, P ;
Williams, PM ;
Chan, AC ;
Clark, HF .
GENES AND IMMUNITY, 2005, 6 (04) :319-331
[3]  
[Anonymous], 2011, R: A Language and Environment for Statistical Computing
[4]  
[Anonymous], 2007, PROJECTED GRADIENT M
[5]  
[Anonymous], 2001, P ADV NEUR INF PROC
[6]  
[Anonymous], R NEWS
[7]  
Barrett T., 2010, NUCLEIC ACIDS RES, V39, P1005
[8]  
Berry M., 2007, COMPUTATIONAL STAT D
[9]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[10]   Nonnegative matrix and tensor factorization [J].
Cichocki, Andrzej ;
Zdunek, Rafal ;
Amari, Shun-Ichi .
IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (01) :142-145