Identification of clusters in tissue samples in gene expression data with Principal Component Analysis based on relative variance matrix

被引:0
作者
Nawaz, Uzma [1 ]
Ali, Asghar [1 ]
机构
[1] Bahauddin Zakariya Univ, Dept Stat, Multan 60800, Pakistan
来源
AFRICAN JOURNAL OF MICROBIOLOGY RESEARCH | 2011年 / 5卷 / 01期
关键词
Clustering methods; gene expression analysis; principal component analysis; the relative variance covariance matrix; principal component loadings; ACUTE MYELOID-LEUKEMIA; MOLECULAR CLASSIFICATION; PATTERNS; CANCER;
D O I
暂无
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Principal Component Analysis (PCA) has been in use as a preprocessing step to clustering for long. We have focused on the clustering of tissue samples in gene expression data. Different clustering techniques and algorithm are available in literature on gene expression data but with the existing ambiguity on the number of clusters, apart from relying on biologically known groups. A consensus is needed to reach on the number of clusters in the wide variety of existing clustering techniques based on different similarity or dissimilarity metrics. The conventional usage of PCA for clustering is either by forcing the unit variance to each variable or the high magnitude of variance of an individual variable is allowed to dominate the entire results of PCA. We propose the use of relative variance covariance method in PCA, so as to give due consideration to the joint and individual variances in the dataset and identify clusters with principal component loadings. We emphasize empirically that the proposed approach of PCA is conclusively more informative than the available approaches to identify cluster structure in tissue samples (sample expression profiles). Clusters formed are valid with the existing results on the data set under study and with valid biological background.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 28 条
[1]   Transcriptional accessibility for genes of multiple tissues and hematopoietic lineages is hierarchically controlled during early hematopoiesis [J].
Akashi, K ;
He, X ;
Chen, J ;
Iwasaki, H ;
Niu, C ;
Steenhard, B ;
Zhang, JW ;
Haug, J ;
Li, LH .
BLOOD, 2003, 101 (02) :383-390
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]  
Anderberg M.R, 1983, CLUSTER ANAL APPL
[4]   ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS [J].
ANDERSON, TW .
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (01) :122-&
[5]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[6]   Molecular classification of cutaneous malignant melanoma by gene expression profiling [J].
Bittner, M ;
Meitzer, P ;
Chen, Y ;
Jiang, Y ;
Seftor, E ;
Hendrix, M ;
Radmacher, M ;
Simon, R ;
Yakhini, Z ;
Ben-Dor, A ;
Sampas, N ;
Dougherty, E ;
Wang, E ;
Marincola, F ;
Gooden, C ;
Lueders, J ;
Glatfelter, A ;
Pollock, P ;
Carpten, J ;
Gillanders, E ;
Leja, D ;
Dietrich, K ;
Beaudry, C ;
Berens, M ;
Alberts, D ;
Sondak, V ;
Hayward, N ;
Trent, J .
NATURE, 2000, 406 (6795) :536-540
[7]  
BOIK RJ, 2008, PRINCIPAL COMPONENTS, DOI DOI 10.1016/J.STAMET.2008.02
[8]   Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia [J].
Bullinger, L ;
Döhner, K ;
Bair, E ;
Fröhling, S ;
Schlenk, RF ;
Tibshirani, R ;
Döhner, H ;
Pollack, JR .
NEW ENGLAND JOURNAL OF MEDICINE, 2004, 350 (16) :1605-1616
[9]  
CHANG WC, 1983, APPL STAT-J ROY ST C, V32, P267
[10]   How does gene expression clustering work? [J].
D'haeseleer, P .
NATURE BIOTECHNOLOGY, 2005, 23 (12) :1499-1501