Geometric Morphometric Data Augmentation Using Generative Computational Learning Algorithms

被引:9
作者
Courtenay, Lloyd A. [1 ]
Gonzalez-Aguilera, Diego [1 ]
机构
[1] Univ Salamanca, Higher Polytech Sch Avila, Dept Cartog & Terrain Engn, Hornos Caleros 50, Avila 05003, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 24期
关键词
archaeological data science; artificial intelligence; unsupervised learning; generative adversarial networks; robust statistics; NEURAL-NETWORKS; EQUIVALENCE; TESTS; SMOTE;
D O I
10.3390/app10249133
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Geometric Morphometrics are a powerful multivariate statistical toolset for the analysis of morphology. While typically used in the study of biological and anatomical variance, modern applications now incorporate these tools into a number of different fields of non-biological origin. Nevertheless, as with many fields of data science, Geometric Morphometric techniques are often impeded by issues concerning sample size. The present study thus evaluates a number of different computational learning algorithms for the augmentation of different datasets. Here we show how generative algorithms from Artificial Intelligence are able to produce highly realistic synthetic data; helping improve the quality of any statistical or predictive modelling applications that may follow. The fossil record is notorious for being incomplete and distorted, frequently conditioning the type of knowledge that can be extracted from it. In many cases, this often leads to issues when performing complex statistical analyses, such as classification tasks, predictive modelling, and variance analyses, such as those used in Geometric Morphometrics. Here different Generative Adversarial Network architectures are experimented with, testing the effects of sample size and domain dimensionality on model performance. For model evaluation, robust statistical methods were used. Each of the algorithms were observed to produce realistic data. Generative Adversarial Networks using different loss functions produced multidimensional synthetic data significantly equivalent to the original training data. Conditional Generative Adversarial Networks were not as successful. The methods proposed are likely to reduce the impact of sample size and bias on a number of statistical learning applications. While Generative Adversarial Networks are not the solution to all sample-size related issues, combined with other pre-processing steps these limitations may be overcome. This presents a valuable means of augmenting geometric morphometric datasets for greater predictive visualization.
引用
收藏
页数:25
相关论文
共 85 条
[1]  
Albrecht G.H., 1992, Human Evolution, V7, P49, DOI 10.1007/BF02436412
[2]   A State-of-the-Art Survey on Deep Learning Theory and Architectures [J].
Alom, Md Zahangir ;
Taha, Tarek M. ;
Yakopcic, Chris ;
Westberg, Stefan ;
Sidike, Paheding ;
Nasrin, Mst Shamima ;
Hasan, Mahmudul ;
Van Essen, Brian C. ;
Awwal, Abdul A. S. ;
Asari, Vijayan K. .
ELECTRONICS, 2019, 8 (03)
[3]   There's More Than One Way to Conduct a Replication Study: Beyond Statistical Significance [J].
Anderson, Samantha F. ;
Maxwell, Scott E. .
PSYCHOLOGICAL METHODS, 2016, 21 (01) :1-12
[4]  
[Anonymous], 2016, C NEUR INF PROC SYST
[5]  
[Anonymous], 2017, P INT C LEARN REPR
[6]  
[Anonymous], 2013, The elements of statistical learning
[7]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[8]   Combining geometric morphometrics with pattern recognition for the investigation of species complexes [J].
Baylac, M ;
Villemant, C ;
Simbolotti, G .
BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2003, 80 (01) :89-98
[9]  
Bishop C.M., 1995, Neural networks for pattern recognition
[10]  
Bishop CM., 2006, Pattern Recognition and Machine Learning