Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry

被引:3
作者
Lopez-Garcia, Pedro A. [1 ]
Argote, Denisse L. [2 ]
机构
[1] Escuela Nacl Antropol Hist, Posgrad Arqueol, Perifer Sur Esq,Calle Zapote,Col Isidro Fabela, Mexico City, Mexico
[2] Inst Nacl Antropol & Hist, Direcc Estudios Arqueol, Tacuba 76,Colonia Ctr, Mexico City, Mexico
关键词
Archaeological glass; High-dimensional data; Dimensionality reduction; Feature selection; Databionic Swarm; Datavisualization; COMPOSITIONAL DATA-ANALYSIS; R PACKAGE; MODEL; GLASS; CLASSIFICATION; KNOWLEDGE; ANTWERP;
D O I
10.1016/j.jasrep.2023.104022
中图分类号
K85 [文物考古];
学科分类号
0601 ;
摘要
In this article, three variable selection methods based on Gaussian mixture models were compared to find a subset of variables that provided the "best" clustering. The use of an appropriate transformation for composi-tional data, whose geometric space is the Simplex, is emphasized. The comparison revealed the ability of the models to cluster data in multiple phases, showing to be more convenient to select the relevant variables than to perform an analysis based on 2D plots or by simultaneously including all the available variables in a multivariate analysis. Once the informative variables for the clustering were obtained, we used a method called Databionic Swarm (DBS). This method uses unsupervised machine learning, taking advantage of emergence and swarm intelligence applied to find natural chemical groups in the input data space. DBS can visualize high-dimensional distances in the projection through a 3D topographic map with hypsometric tints. The results were compared in terms of accuracy, both in the selection of the variables and in the classification, using a supervised accuracy index for clustering and two unsupervised indexes (the Heatmap and the Silhouette plot). The concepts and methods were illustrated by applying them to two published archaeological glass data sets. The first set consisted of 245 Romano-British glass vessels and the second set of 180 glass vessels from the 15th-17th century in Antwerp. In these applications, it was found that the methods for the selection of variables increased the ac-curacy of the classification compared to traditional methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Identification of a Novel Prognostic Classification Model in Epithelial Ovarian Cancer by Cluster Analysis
    Chen, Kelie
    Niu, Yuequn
    Wang, Shengchao
    Fu, Zhiqin
    Lin, Hui
    Lu, Jiaoying
    Meng, Xinyi
    Yang, Bowen
    Zhang, Honghe
    Wu, Yihua
    Xia, Dajing
    Lu, Weiguo
    CANCER MANAGEMENT AND RESEARCH, 2020, 12 : 6251 - 6259
  • [42] Identification of management zones in precision agriculture: An evaluation of alternative cluster analysis methods
    Gavioli, Alan
    de Souza, Eduardo Godoy
    Bazzi, Claudio Leones
    Schenatto, Kelyn
    Betzek, Nelson Miguel
    BIOSYSTEMS ENGINEERING, 2019, 181 : 86 - 102
  • [43] Identification of knee osteoarthritis disability phenotypes regarding activity limitation: a cluster analysis
    Vongsirinavarat, Mantana
    Nilmart, Patcharin
    Somprasong, Sirikarn
    Apinonkul, Benjawan
    BMC MUSCULOSKELETAL DISORDERS, 2020, 21 (01)
  • [44] Subfield management class delineation using cluster analysis from spatial principal components of soil variables
    Cordoba, M.
    Bruno, C.
    Costa, J.
    Balzarini, M.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2013, 97 : 6 - 14
  • [45] Heuristic Feature Selection with Classification Efficiency Using Soft Cluster Analysis for Biological Datasets
    Lin, Hung-yi
    Chen, Rong-chang
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (04) : 951 - 973
  • [46] Selection of Input Variables for Covering Materials Through Sensitivity Analysis Based on Greenhouse Energy Demands
    Goo, Jabin
    Shin, Hakjong
    Kang, Jeonga
    Mun, Sunhye
    Kwak, Younghoon
    Huh, Jungho
    PROCEEDINGS OF BUILDING SIMULATION 2021: 17TH CONFERENCE OF IBPSA, 2022, 17 : 1657 - 1664
  • [47] Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios
    Lawrence A. Adutwum
    A. Paulina de la Mata
    Heather D. Bean
    Jane E. Hill
    James J. Harynuk
    Analytical and Bioanalytical Chemistry, 2017, 409 : 6699 - 6708
  • [48] Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios
    Adutwum, Lawrence A.
    de la Mata, A. Paulina
    Bean, Heather D.
    Hill, Jane E.
    Harynuk, James J.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2017, 409 (28) : 6699 - 6708
  • [49] Variability Identification and Uncertainty Evolution Characteristic Analysis of Hydrological Variables in Anhui Province, China
    Bai, Xia
    Yu, Jinhuang
    Li, Yule
    Jin, Juliang
    Wu, Chengguo
    Zhou, Rongxing
    ENTROPY, 2025, 27 (03)
  • [50] Identification of Errors-in-Variables Models Using Dynamic Iterative Principal Component Analysis
    Maurya, Deepak
    Tangirala, Arun K.
    Narasimhan, Shankar
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2018, 57 (35) : 11939 - 11954