Missing genotype imputation in non-model species using self-organizing maps

被引:1
作者
Mora-Marquez, Fernando [1 ]
Nuno, Juan Carlos [2 ]
Soto, Alvaro [1 ]
de Heredia, Unai Lopez [1 ]
机构
[1] Univ Politecn Madrid, Dept Sistemas & Recursos Nat, GI Especies Lenosas WooSp, ETSI Montes Forestal & Medio Nat, Jose Antonio Novais 10,Ciudad Univ, Madrid 28040, Spain
[2] Univ Politecn Madrid, Dept Matemat Aplicada, GI Especies Lenosas WooSp, ETSI Montes Forestal & Medio Nat, Ciudad Univ, Madrid, Spain
关键词
imputation; machine learning; missing data; SNP genotyping; SOM; ASSOCIATION; ALGORITHM; INFERENCE; RADSEQ; PCA;
D O I
10.1111/1755-0998.13992
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Clustering of ant communities and indicator species analysis using self-organizing maps
    Park, Sang-Hyun
    Hosoishi, Shingo
    Ogata, Kazuo
    Kuboki, Yuzuru
    COMPTES RENDUS BIOLOGIES, 2014, 337 (09) : 545 - 552
  • [2] A granular computing framework for self-organizing maps
    Herbert, Joseph P.
    Yao, JingTao
    NEUROCOMPUTING, 2009, 72 (13-15) : 2865 - 2872
  • [3] Theoretical and Applied Aspects of the Self-Organizing Maps
    Cottrell, Marie
    Olteanu, Madalina
    Rossi, Fabrice
    Villa-Vialaneix, Nathalie
    ADVANCES IN SELF-ORGANIZING MAPS AND LEARNING VECTOR QUANTIZATION, WSOM 2016, 2016, 428 : 3 - 26
  • [4] Automatic Feature Engineering Using Self-Organizing Maps
    Rodrigues, Ericks da Silva
    Martins, Denis Mayr Lima
    de Lima Neto, Fernando Buarque
    2021 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2021,
  • [5] LinkImputeR: user-guided genotype calling and imputation for non-model organisms
    Daniel Money
    Zoë Migicovsky
    Kyle Gardner
    Sean Myles
    BMC Genomics, 18
  • [6] IntraSOM: A comprehensive Python']Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling
    de Gouvea, Rodrigo Cesar Teixeira
    Gioria, Rafael dos Santos
    Marques, Gustavo Rodovalho
    Carneiro, Cleyton de Carvalho
    SOFTWARE IMPACTS, 2023, 17
  • [7] LinkImputeR: user-guided genotype calling and imputation for non-model organisms
    Money, Daniel
    Migicovsky, Zoe
    Gardner, Kyle
    Myles, Sean
    BMC GENOMICS, 2017, 18
  • [8] Self-organizing maps for texture classification
    Nedyalko Petrov
    Antoniya Georgieva
    Ivan Jordanov
    Neural Computing and Applications, 2013, 22 : 1499 - 1508
  • [9] Model-Based Clustering by Probabilistic Self-Organizing Maps
    Cheng, Shih-Sian
    Fu, Hsin-Chia
    Wang, Hsin-Min
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (05): : 805 - 826
  • [10] A faster dynamic convergency approach for self-organizing maps
    Jamil, Akhtar
    Hameed, Alaa Ali
    Orman, Zeynep
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 677 - 696