Supervised learning algorithms in the classification of plant populations with different degrees of kinship

被引:0
作者
Leandro Skowronski
Paula Martin de Moraes
Mario Luiz Teixeira de Moraes
Wesley Nunes Gonçalves
Michel Constantino
Celso Soares Costa
Wellington Santos Fava
Reginaldo B. Costa
机构
[1] Dom Bosco Catholic University,Laboratory of Ecology and Evolutionary Biology, Institute of Biosciences
[2] Federal University of Grande Dourados,undefined
[3] Paulista State University Júlio de Mesquita Filho,undefined
[4] Federal University of Mato Grosso do Sul,undefined
[5] Federal Institute of Education,undefined
[6] Science and Technology of Mato Grosso do Sul,undefined
[7] Federal University of Mato Grosso do Sul,undefined
来源
Brazilian Journal of Botany | 2021年 / 44卷
关键词
Classification methods; Genetic improvement; Machine learning; Similarity between populations;
D O I
暂无
中图分类号
学科分类号
摘要
The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity—comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.
引用
收藏
页码:371 / 379
页数:8
相关论文
共 92 条
  • [31] Hernández-Montes E(undefined)undefined undefined undefined undefined-undefined
  • [32] Escalona JM(undefined)undefined undefined undefined undefined-undefined
  • [33] Bota J(undefined)undefined undefined undefined undefined-undefined
  • [34] Gonzalez Viejo C(undefined)undefined undefined undefined undefined-undefined
  • [35] Poblete-Echeverría C(undefined)undefined undefined undefined undefined-undefined
  • [36] Tongson E(undefined)undefined undefined undefined undefined-undefined
  • [37] Medrano H(undefined)undefined undefined undefined undefined-undefined
  • [38] Li T(undefined)undefined undefined undefined undefined-undefined
  • [39] Zhu S(undefined)undefined undefined undefined undefined-undefined
  • [40] Ogihara M(undefined)undefined undefined undefined undefined-undefined