New exploratory clustering tool

被引:11
作者
Acar, Evrim [1 ]
Bro, Rasmus [2 ]
Schmidt, Bonnie [3 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Univ Copenhagen, Fac Life Sci, Dept Food Sci, Copenhagen, Denmark
[3] Univ Copenhagen, Fac Pharmaceut Sci, Dept Med Chem, Copenhagen, Denmark
关键词
data mining; clustering; multiway models; higher-order data; visualization;
D O I
10.1002/cem.1106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a clustering method on three-way arrays making use of an exploratory visualization approach. The aim of this study is to cluster samples in the object mode of a three-way array, which is done using the scores (sample loadings) of a three-way factor model, for example, a Tucker3 or a PARAFAC model. Further, tools are developed to explore and identify reasons for particular clusters by visually mining the data using the clustering results as guidance. We introduce a three-way clustering tool and demonstrate our results on a metabolite profiling dataset. We explore how high performance liquid chromatography (HPLC) measurements of commercial extracts, of St. John's wort (natural remedies for the treatment of mild to moderate depression) differ and which chemical compounds account for those differences. Using common distance measures, for example, Euclidean or Mahalanobis, on the scores of a three-way model, we verify that we can capture the underlying clustering structure in the data. Beside this, by making use of the visualization approach, we are able to identify the variables playing a significant role in the extracted cluster structure. The suggested approach generalizes straightforwardly to higher-order data and also to two-way data. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:91 / 100
页数:10
相关论文
共 20 条
[11]  
2-I
[12]   A fast method for choosing the numbers of components in Tucker3 analysis [J].
Kiers, HAL ;
Kinderen, AD .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2003, 56 :119-125
[13]  
Ng AY, 2002, ADV NEUR IN, V14, P849
[14]   Multivariate analysis of integrated and full-resolution 1H-NMR spectral data from complex pharmaceutical preparations:: St. John's wort [J].
Rasmussen, Bonnie ;
Cloarec, Olivier ;
Tang, Huiru ;
Staerk, Dan ;
Jaroszewski, Jerzy W. .
PLANTA MEDICA, 2006, 72 (06) :556-563
[15]   Three-mode principal components analysis: Choosing the numbers of components and sensitivity to local optima [J].
Timmerman, ME ;
Kiers, HAL .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2000, 53 :1-16
[16]   Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data [J].
Tomasi, G ;
van den Berg, F ;
Andersson, C .
JOURNAL OF CHEMOMETRICS, 2004, 18 (05) :231-241
[17]   SOME MATHEMATICAL NOTES ON 3-MODE FACTOR ANALYSIS [J].
TUCKER, LR .
PSYCHOMETRIKA, 1966, 31 (03) :279-279
[18]   One-mode classification of a three-way data matrix [J].
Vichi, M .
JOURNAL OF CLASSIFICATION, 1999, 16 (01) :27-44
[19]  
Zhao L., 2005, P P 2005 ACM SIGMOD, P694
[20]  
P INT S APPL STOCH M