Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry

被引：3

作者：

Lopez-Garcia, Pedro A. ^{[1
]}

Argote, Denisse L. ^{[2
]}

机构：

[1] Escuela Nacl Antropol Hist, Posgrad Arqueol, Perifer Sur Esq,Calle Zapote,Col Isidro Fabela, Mexico City, Mexico

[2] Inst Nacl Antropol & Hist, Direcc Estudios Arqueol, Tacuba 76,Colonia Ctr, Mexico City, Mexico

来源：

JOURNAL OF ARCHAEOLOGICAL SCIENCE-REPORTS | 2023年 / 49卷

关键词：

Archaeological glass; High-dimensional data; Dimensionality reduction; Feature selection; Databionic Swarm; Datavisualization; COMPOSITIONAL DATA-ANALYSIS; R PACKAGE; MODEL; GLASS; CLASSIFICATION; KNOWLEDGE; ANTWERP;

D O I：

10.1016/j.jasrep.2023.104022

中图分类号：

K85 [文物考古];

学科分类号：

0601 ;

摘要：

In this article, three variable selection methods based on Gaussian mixture models were compared to find a subset of variables that provided the "best" clustering. The use of an appropriate transformation for composi-tional data, whose geometric space is the Simplex, is emphasized. The comparison revealed the ability of the models to cluster data in multiple phases, showing to be more convenient to select the relevant variables than to perform an analysis based on 2D plots or by simultaneously including all the available variables in a multivariate analysis. Once the informative variables for the clustering were obtained, we used a method called Databionic Swarm (DBS). This method uses unsupervised machine learning, taking advantage of emergence and swarm intelligence applied to find natural chemical groups in the input data space. DBS can visualize high-dimensional distances in the projection through a 3D topographic map with hypsometric tints. The results were compared in terms of accuracy, both in the selection of the variables and in the classification, using a supervised accuracy index for clustering and two unsupervised indexes (the Heatmap and the Silhouette plot). The concepts and methods were illustrated by applying them to two published archaeological glass data sets. The first set consisted of 245 Romano-British glass vessels and the second set of 180 glass vessels from the 15th-17th century in Antwerp. In these applications, it was found that the methods for the selection of variables increased the ac-curacy of the classification compared to traditional methods.

引用

页数：22

共 50 条

[1] Identification of discriminatory variables in proteomics data analysis by clustering of variables
Karimi, Sadegh
Hemmateenejad, Bahram
ANALYTICA CHIMICA ACTA, 2013, 767 : 35 - 43
[2] WEIGHTING AND SELECTION OF VARIABLES FOR CLUSTER-ANALYSIS
GNANADESIKAN, R
KETTENRING, JR
TSAO, SL
JOURNAL OF CLASSIFICATION, 1995, 12 (01) : 113 - 136
[3] Cluster analysis of longitudinal profiles with subgroups
Zhu, Xiaolu
Qu, Annie
ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (01): : 171 - 193
[4] Modelling the role of variables in model-based cluster analysis
Galimberti, Giuliano
Manisi, Annamaria
Soffritti, Gabriele
STATISTICS AND COMPUTING, 2018, 28 (01) : 145 - 169
[5] Who are the obese? A cluster analysis exploring subgroups of the obese
Green, M. A.
Strong, M.
Razak, F.
Subramanian, S. V.
Relton, C.
Bissell, P.
JOURNAL OF PUBLIC HEALTH, 2016, 38 (02) : 258 - 264
[6] A METHOD FOR VISUAL IDENTIFICATION OF SMALL SAMPLE SUBGROUPS AND POTENTIAL BIOMARKERS
Soneson, Charlotte
Fontes, Magnus
ANNALS OF APPLIED STATISTICS, 2011, 5 (03) : 2131 - 2149
[7] Follicular thyroid lesions: is there a discriminatory potential in the computerized nuclear analysis?
Valentim, Flavia O.
Coelho, Barbara P.
Miot, Helio A.
Hayashi, Caroline Y.
Jaune, Danilo T. A.
Oliveira, Cristiano C.
Marques, Mariangela E. A.
Tagliarini, Jose Vicente
Castilho, Emanuel C.
Soares, Paula
Mazeto, Glaucia M. F. S.
ENDOCRINE CONNECTIONS, 2018, 7 (08): : 907 - 913
[8] Feature cluster selection for high-throughput data analysis
Yu, Lei
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (02) : 177 - 191
[9] Effective feature selection framework for cluster analysis of microarray data
Pok, Gouchol
Liu, Jyh-Charn Steve
Ryu, Keun Ho
BIOINFORMATION, 2010, 4 (08) : 385 - 389
[10] Feature Selection for Cluster Analysis in Spectroscopy
Crase, Simon
Hall, Benjamin
Thennadil, Suresh N.
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 2435 - 2458

← 1 2 3 4 5 →