Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry

被引:3
作者
Lopez-Garcia, Pedro A. [1 ]
Argote, Denisse L. [2 ]
机构
[1] Escuela Nacl Antropol Hist, Posgrad Arqueol, Perifer Sur Esq,Calle Zapote,Col Isidro Fabela, Mexico City, Mexico
[2] Inst Nacl Antropol & Hist, Direcc Estudios Arqueol, Tacuba 76,Colonia Ctr, Mexico City, Mexico
关键词
Archaeological glass; High-dimensional data; Dimensionality reduction; Feature selection; Databionic Swarm; Datavisualization; COMPOSITIONAL DATA-ANALYSIS; R PACKAGE; MODEL; GLASS; CLASSIFICATION; KNOWLEDGE; ANTWERP;
D O I
10.1016/j.jasrep.2023.104022
中图分类号
K85 [文物考古];
学科分类号
0601 ;
摘要
In this article, three variable selection methods based on Gaussian mixture models were compared to find a subset of variables that provided the "best" clustering. The use of an appropriate transformation for composi-tional data, whose geometric space is the Simplex, is emphasized. The comparison revealed the ability of the models to cluster data in multiple phases, showing to be more convenient to select the relevant variables than to perform an analysis based on 2D plots or by simultaneously including all the available variables in a multivariate analysis. Once the informative variables for the clustering were obtained, we used a method called Databionic Swarm (DBS). This method uses unsupervised machine learning, taking advantage of emergence and swarm intelligence applied to find natural chemical groups in the input data space. DBS can visualize high-dimensional distances in the projection through a 3D topographic map with hypsometric tints. The results were compared in terms of accuracy, both in the selection of the variables and in the classification, using a supervised accuracy index for clustering and two unsupervised indexes (the Heatmap and the Silhouette plot). The concepts and methods were illustrated by applying them to two published archaeological glass data sets. The first set consisted of 245 Romano-British glass vessels and the second set of 180 glass vessels from the 15th-17th century in Antwerp. In these applications, it was found that the methods for the selection of variables increased the ac-curacy of the classification compared to traditional methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Redundant Feature Identification and Redundancy Analysis for Causal Feature Selection
    Limshuebchuey, Asavaron
    Duangsoithong, Rakkrit
    Windeatt, Terry
    2015 8TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2015,
  • [32] Effect of dimensionality reduction on stock selection with cluster analysis in different market situations
    Han, Jingti
    Ge, Zhipeng
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147
  • [33] Data-driven cluster analysis of insomnia disorder with physiology-based qEEG variables
    McCloskey, Stephen
    Jeffries, Bryn
    Koprinska, Irena
    Miller, Christopher B.
    Grunstein, Ronald R.
    KNOWLEDGE-BASED SYSTEMS, 2019, 183
  • [34] Variable selection in discriminant analysis for mixed continuous-binary variables and several groups
    Mbina, Alban Mbina
    Nkiet, Guy Martial
    Obiang, Fulgence Eyi
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) : 773 - 795
  • [35] Variable selection in discriminant analysis for mixed continuous-binary variables and several groups
    Alban Mbina Mbina
    Guy Martial Nkiet
    Fulgence Eyi Obiang
    Advances in Data Analysis and Classification, 2019, 13 : 773 - 795
  • [36] Identification of distinct subgroups of Sjogren's disease by cluster analysis based on clinical and biological manifestations: data from the cross-sectional Paris-Saclay and the prospective ASSESS cohorts
    Nguyen, Yann
    Nocturne, Gaetane
    Henry, Julien
    Ng, Wan-Fai
    Belkhir, Rakiba
    Desmoulins, Frederic
    Berge, Elisabeth
    Morel, Jacques
    Perdriger, Aleth
    Dernis, Emmanuelle
    Devauchelle-Pensec, Valerie
    Sene, Damien
    Dieude, Philippe
    Couderc, Marion
    Fauchais, Anne-Laure
    Larroche, Claire
    Vittecoq, Olivier
    Salliot, Carine
    Hachulla, Eric
    Le Guern, Veronique
    Gottenberg, Jacques-Eric
    Mariette, Xavier
    Seror, Raphaele
    LANCET RHEUMATOLOGY, 2024, 6 (04) : e216 - e225
  • [37] Constructing measures for school process variables: the potential of multilevel confirmatory factor analysis
    D'Haenens, Ellen
    Van Damme, Jan
    Onghena, Patrick
    QUALITY & QUANTITY, 2012, 46 (01) : 155 - 188
  • [38] The identification of high potential archers based on fitness and motor ability variables: A Support Vector Machine approach
    Taha, Zahari
    Musa, Rabiu Muazu
    Majeed, Anwar P. P. Abdul
    Alim, Muhammad Muaz
    Abdullah, Mohamad Razali
    HUMAN MOVEMENT SCIENCE, 2018, 57 : 184 - 193
  • [39] Cluster analysis of clinical manifestations assigns systemic lupus erythematosus-phenotype subgroups: A multicentre study on 440 patients
    Mariette, Fanny
    Le Guern, Veronique
    Nguyen, Yann
    Yelnik, Cecile
    Morel, Nathalie
    Hachulla, Eric
    Lambert, Marc
    Guettrot-Imbert, Gaelle
    Mouthon, Luc
    Ebbo, Mikael
    Costedoat-Chalumeau, Nathalie
    JOINT BONE SPINE, 2024, 91 (06)
  • [40] Latent class cluster analysis of symptom ratings identifies distinct subgroups within the clinical high risk for psychosis syndrome
    Ryan, Arthur T.
    Addington, Jean
    Bearden, Carrie E.
    Cadenhead, Kristin S.
    Cornblatt, Barbara A.
    Mathalon, Daniel H.
    McGlashan, Thomas H.
    Perkins, Diana O.
    Seidman, Larry J.
    Tsuang, Ming T.
    Woods, Scott W.
    Cannon, Tyrone D.
    Walker, Elaine F.
    SCHIZOPHRENIA RESEARCH, 2018, 197 : 522 - 530