New exploratory clustering tool

被引:11
作者
Acar, Evrim [1 ]
Bro, Rasmus [2 ]
Schmidt, Bonnie [3 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Univ Copenhagen, Fac Life Sci, Dept Food Sci, Copenhagen, Denmark
[3] Univ Copenhagen, Fac Pharmaceut Sci, Dept Med Chem, Copenhagen, Denmark
关键词
data mining; clustering; multiway models; higher-order data; visualization;
D O I
10.1002/cem.1106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a clustering method on three-way arrays making use of an exploratory visualization approach. The aim of this study is to cluster samples in the object mode of a three-way array, which is done using the scores (sample loadings) of a three-way factor model, for example, a Tucker3 or a PARAFAC model. Further, tools are developed to explore and identify reasons for particular clusters by visually mining the data using the clustering results as guidance. We introduce a three-way clustering tool and demonstrate our results on a metabolite profiling dataset. We explore how high performance liquid chromatography (HPLC) measurements of commercial extracts, of St. John's wort (natural remedies for the treatment of mild to moderate depression) differ and which chemical compounds account for those differences. Using common distance measures, for example, Euclidean or Mahalanobis, on the scores of a three-way model, we verify that we can capture the underlying clustering structure in the data. Beside this, by making use of the visualization approach, we are able to identify the variables playing a significant role in the extracted cluster structure. The suggested approach generalizes straightforwardly to higher-order data and also to two-way data. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:91 / 100
页数:10
相关论文
共 20 条
[1]  
Acar E, 2005, LECT NOTES COMPUT SC, V3495, P256
[2]  
[Anonymous], ICML 05
[3]   A new efficient method for determining the number of components in PARAFAC models [J].
Bro, R ;
Kiers, HAL .
JOURNAL OF CHEMOMETRICS, 2003, 17 (05) :274-286
[4]   Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method [J].
Ceulemans, E ;
Kiers, HAL .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2006, 59 :133-150
[5]   A multilinear singular value decomposition [J].
De Lathauwer, L ;
De Moor, B ;
Vandewalle, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) :1253-1278
[6]  
Harshman R. A., 1970, UCLA Work. Papers Phonetics, DOI DOI 10.1134/S0036023613040165
[7]   PRINCIPAL COMPONENTS AND FACTOR-ANALYSIS .1. PRINCIPAL COMPONENTS [J].
JACKSON, JE .
JOURNAL OF QUALITY TECHNOLOGY, 1980, 12 (04) :201-213
[9]   HIERARCHICAL CLUSTERING SCHEMES [J].
JOHNSON, SC .
PSYCHOMETRIKA, 1967, 32 (03) :241-254
[10]  
Kiers HAL, 2000, J CHEMOMETR, V14, P105, DOI 10.1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO