Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry

被引:3
作者
Lopez-Garcia, Pedro A. [1 ]
Argote, Denisse L. [2 ]
机构
[1] Escuela Nacl Antropol Hist, Posgrad Arqueol, Perifer Sur Esq,Calle Zapote,Col Isidro Fabela, Mexico City, Mexico
[2] Inst Nacl Antropol & Hist, Direcc Estudios Arqueol, Tacuba 76,Colonia Ctr, Mexico City, Mexico
关键词
Archaeological glass; High-dimensional data; Dimensionality reduction; Feature selection; Databionic Swarm; Datavisualization; COMPOSITIONAL DATA-ANALYSIS; R PACKAGE; MODEL; GLASS; CLASSIFICATION; KNOWLEDGE; ANTWERP;
D O I
10.1016/j.jasrep.2023.104022
中图分类号
K85 [文物考古];
学科分类号
0601 ;
摘要
In this article, three variable selection methods based on Gaussian mixture models were compared to find a subset of variables that provided the "best" clustering. The use of an appropriate transformation for composi-tional data, whose geometric space is the Simplex, is emphasized. The comparison revealed the ability of the models to cluster data in multiple phases, showing to be more convenient to select the relevant variables than to perform an analysis based on 2D plots or by simultaneously including all the available variables in a multivariate analysis. Once the informative variables for the clustering were obtained, we used a method called Databionic Swarm (DBS). This method uses unsupervised machine learning, taking advantage of emergence and swarm intelligence applied to find natural chemical groups in the input data space. DBS can visualize high-dimensional distances in the projection through a 3D topographic map with hypsometric tints. The results were compared in terms of accuracy, both in the selection of the variables and in the classification, using a supervised accuracy index for clustering and two unsupervised indexes (the Heatmap and the Silhouette plot). The concepts and methods were illustrated by applying them to two published archaeological glass data sets. The first set consisted of 245 Romano-British glass vessels and the second set of 180 glass vessels from the 15th-17th century in Antwerp. In these applications, it was found that the methods for the selection of variables increased the ac-curacy of the classification compared to traditional methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Subgroups of adult-onset diabetes: a data-driven cluster analysis in a Ghanaian population
    Danquah, Ina
    Mank, Isabel
    Hampe, Christiane S. S.
    Meeks, Karlijn A. C.
    Agyemang, Charles
    Owusu-Dabo, Ellis
    Smeeth, Liam
    Klipstein-Grobusch, Kerstin
    Bahendeka, Silver
    Spranger, Joachim
    Mockenhaupt, Frank P. P.
    Schulze, Matthias B. B.
    Rolandsson, Olov
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] Typology based on three density variables central to Spacematrix using cluster analysis
    Pont, Meta Berghauser
    Olsson, Jesper
    24TH ISUF INTERNATIONAL CONFERENCE: CITY AND TERRITORY IN THE GLOBALIZATION AGE, 2018, : 1337 - 1348
  • [23] Efficient Feature Selection based on Soft Cluster Analysis for Biological Datasets
    Lin, Hung-Yi
    Lin, Ting-Han
    2023 13TH INTERNATIONAL CONFERENCE ON BIOSCIENCE, BIOCHEMISTRY AND BIOINFORMATICS, ICBBB 2023, 2023, : 10 - 16
  • [24] Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research
    Khalid, Sara
    Prieto-Alhambra, Daniel
    CURRENT EPIDEMIOLOGY REPORTS, 2019, 6 (03) : 364 - 372
  • [25] SUB-GROUPING PATIENTS WITH NON-SPECIFIC LOW BACK PAIN BASED ON CLUSTER ANALYSIS OF DISCRIMINATORY CLINICAL ITEMS
    Billis, Evdokia
    McCarthy, Christopher J.
    Roberts, Chris
    Gliatis, John
    Papandreou, Maria
    Gioftsos, George
    Oldham, Jacqueline A.
    JOURNAL OF REHABILITATION MEDICINE, 2013, 45 (02) : 177 - 185
  • [26] Recursive Identification of Errors-in-Variables Systems Based on the Correlation Analysis
    Fan, Shujun
    Ding, Feng
    Hayat, Tasawar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (12) : 5951 - 5981
  • [27] Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research
    Sara Khalid
    Daniel Prieto-Alhambra
    Current Epidemiology Reports, 2019, 6 : 364 - 372
  • [28] Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables
    Horiuchi, Yu
    Tanimoto, Shuzou
    Latif, A. H. M. Mahbub
    Urayama, Kevin Y.
    Aoki, Jiro
    Yahagi, Kazuyuki
    Okuno, Taishi
    Sato, Yu
    Tanaka, Tetsu
    Koseki, Keita
    Komiyama, Kota
    Nakajima, Hiroyoshi
    Hara, Kazuhiro
    Tanabe, Kengo
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2018, 262 : 57 - 63
  • [29] Trend analysis in selection of aroma components as variables for chemometric studies of typical wines
    Moret, I
    Gambaro, A
    ANNALI DI CHIMICA, 1996, 86 (7-8) : 309 - 318
  • [30] An integrated feature selection and cluster analysis techniques for case-based reasoning
    Zhu, Guo-Niu
    Hu, Jie
    Qi, Jin
    Ma, Jin
    Peng, Ying-Hong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 14 - 22