Missing genotype imputation in non-model species using self-organizing maps

被引:1
作者
Mora-Marquez, Fernando [1 ]
Nuno, Juan Carlos [2 ]
Soto, Alvaro [1 ]
de Heredia, Unai Lopez [1 ]
机构
[1] Univ Politecn Madrid, Dept Sistemas & Recursos Nat, GI Especies Lenosas WooSp, ETSI Montes Forestal & Medio Nat, Jose Antonio Novais 10,Ciudad Univ, Madrid 28040, Spain
[2] Univ Politecn Madrid, Dept Matemat Aplicada, GI Especies Lenosas WooSp, ETSI Montes Forestal & Medio Nat, Ciudad Univ, Madrid, Spain
关键词
imputation; machine learning; missing data; SNP genotyping; SOM; ASSOCIATION; ALGORITHM; INFERENCE; RADSEQ; PCA;
D O I
10.1111/1755-0998.13992
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Application of Self-Organizing Maps to the Maritime Environment
    Lobo, Victor J. A. S.
    INFORMATION FUSION AND GEOGRAPHIC INFORMATION SYSTEMS, PROCEEDINGS, 2009, : 19 - 36
  • [32] Self-Organizing Maps applied to ecological sciences
    Chon, Tae-Soo
    ECOLOGICAL INFORMATICS, 2011, 6 (01) : 50 - 61
  • [33] Self-organizing maps with information theoretic learning
    Chalasani, Rakesh
    Principe, Jose C.
    NEUROCOMPUTING, 2015, 147 : 3 - 14
  • [34] Carto-SOM: cartogram creation using self-organizing maps
    Henriques, R.
    Bacao, F.
    Lobo, V.
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2009, 23 (04) : 483 - 511
  • [35] A general framework for multilingual text mining using self-organizing maps
    Al-Marghilani, Abdulsamad
    Zedan, Husien
    Ayesh, Aladdin
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 520 - +
  • [36] WEBSOM - Self-organizing maps of document collections
    Kaski, S
    Honkela, T
    Lagus, K
    Kohonen, T
    NEUROCOMPUTING, 1998, 21 (1-3) : 101 - 117
  • [37] FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data
    Van Gassen, Sofie
    Callebaut, Britt
    Van Helden, Mary J.
    Lambrecht, Bart N.
    Demeester, Piet
    Dhaene, Tom
    Saeys, Yvan
    CYTOMETRY PART A, 2015, 87A (07) : 636 - 645
  • [38] Host-based intrusion detection using self-organizing maps
    Lichodzijewski, P
    Zincir-Heywood, AN
    Heywood, MI
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1714 - 1719
  • [39] Event-driven sensor deployment using self-organizing maps
    Department of Computer Science and Industrial Technology, Southeastern Louisiana University, Hammond, LA 70402, United States
    不详
    Int. J. Sens. Netw., 2008, 3 (142-151): : 142 - 151
  • [40] Selection of Optimized Retaining Wall Technique Using Self-Organizing Maps
    Kim, Young-Su
    Park, U-Yeol
    Whang, Seoung-Wook
    Ahn, Dong-Joon
    Kim, Sangyong
    SUSTAINABILITY, 2021, 13 (03) : 1 - 13