Coclustering-a useful tool for chemometrics

被引:24
作者
Bro, Rasmus [1 ]
Papalexakis, Evangelos E. [2 ]
Acar, Evrim [1 ]
Sidiropoulos, Nicholas D. [3 ]
机构
[1] Univ Copenhagen, Dept Food Sci, Fac Life Sci, DK-1958 Frederiksberg, Denmark
[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[3] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN USA
关键词
clustering; coclustering; L1; norm; sparsity; DECOMPOSITION; SELECTION; MATRIX; LASSO; OLIVE;
D O I
10.1002/cem.1424
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, chemometric applications in biology can readily deal with tens of thousands of variables, for instance, in omics and environmental analysis. Other areas of chemometrics also deal with distilling relevant information in highly information-rich data sets. Traditional tools such as the principal component analysis or hierarchical clustering are often not optimal for providing succinct and accurate information from high rank data sets. A relatively little known approach that has shown significant potential in other areas of research is coclustering, where a data matrix is simultaneously clustered in its rows and columns (objects and variables usually). Coclustering is the tool of choice when only a subset of variables is related to a specific grouping among objects. Hence, coclustering allows a select number of objects to share a particular behavior on a select number of variables. In this paper, we describe the basics of coclustering and use three different example data sets to show the advantages and shortcomings of coclustering. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:256 / 263
页数:8
相关论文
共 21 条
[1]  
[Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[2]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[3]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[4]  
Cho H, 2004, SIAM PROC S, P114
[5]   Applications of a new subspace clustering algorithm (COSA) in medical systems biology [J].
Damian, Doris ;
Oresic, Matej ;
Verheij, Elwin ;
Meulman, Jacqueline ;
Friedman, Jerome ;
Adourian, Aram ;
Morel, Nicole ;
Smilde, Age ;
van der Greef, Jan .
METABOLOMICS, 2007, 3 (01) :69-77
[6]   Olive oil quantification of edible vegetable oil blends using triacylglycerols chromatographic fingerprints and chemometric tools [J].
de la Mata-Espinosa, P. ;
Bosque-Sendra, J. M. ;
Bro, R. ;
Cuadros-Rodriguez, L. .
TALANTA, 2011, 85 (01) :177-182
[7]   Discriminating olive and non-olive oils using HPLC-CAD and chemometrics [J].
de la Mata-Espinosa, P. ;
Bosque-Sendra, J. M. ;
Bro, R. ;
Cuadros-Rodriguez, L. .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2011, 399 (06) :2083-2092
[8]  
Dhillon I.S., 2001, P 7 ACM SIGKDD INT C, P269, DOI DOI 10.1145/502512.502550
[9]   Clustering objects on subsets of attributes [J].
Friedman, JH ;
Meulman, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :815-839
[10]   Genetic algorithm based two-mode clustering of metabolomics data [J].
Hageman, J. A. ;
van den Berg, R. A. ;
Westerhuis, J. A. ;
van der Werf, M. J. ;
Smilde, A. K. .
METABOLOMICS, 2008, 4 (02) :141-149