ARG-based genome-wide analysis of cacao cultivars

被引:4
作者
Utro, Filippo [1 ]
Cornejo, Omar Eduardo [2 ]
Livingstone, Donald [3 ]
Motamayor, Juan Carlos [4 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Computat Biol Ctr, Yorktown Hts, NY 10598 USA
[2] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
[3] USDA, Miami, FL 33186 USA
[4] Mars Inc, Miami, FL 33158 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
ANCESTRAL RECOMBINATIONS GRAPH; GENETIC DIVERSITY; NETWORKS; PATTERNS; MARKERS;
D O I
10.1186/1471-2105-13-S19-S17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. Results: While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. Conclusions: We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Genome-Wide Analysis of Codon Usage Bias in Epichloe festucae
    Li, Xiuzhang
    Song, Hui
    Kuang, Yu
    Chen, Shuihong
    Tian, Pei
    Li, Chunjie
    Nan, Zhibiao
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2016, 17 (07)
  • [32] Genome-Wide Survey and Analysis of Microsatellite Sequences in Bovid Species
    Qi, Wen-Hua
    Jiang, Xue-Mei
    Du, Lian-Ming
    Xiao, Guo-Sheng
    Hu, Ting-Zhang
    Yue, Bi-Song
    Quan, Qiu-Mei
    PLOS ONE, 2015, 10 (07):
  • [33] OKseqHMM: a genome-wide replication fork directionality analysis toolkit
    Liu, Yaqun
    Wu, Xia
    d'Aubenton-Carafa, Yves
    Thermes, Claude
    Chen, Chun-Long
    NUCLEIC ACIDS RESEARCH, 2023, 51 (04) : E22
  • [34] Genome-wide SNP genotyping as a simple and practical tool to accelerate the development of inbred lines in outbred tree species: An example in cacao (Theobroma cacao L.)
    Lopes, Uilson Vanderlei
    Pires, Jose Luis
    Gramacho, Karina Peres
    Grattapaglia, Dario
    PLOS ONE, 2022, 17 (10):
  • [35] Variable set enrichment analysis in genome-wide association studies
    Yang, Wei
    de las Fuentes, Lisa
    Davila-Roman, Victor G.
    Gu, C. Charles
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2011, 19 (08) : 893 - 900
  • [36] Genome-wide SNP analysis of three Azerbaijani sheep breeds
    Dotsev, Arsen V.
    Deniskova, Tatiana E.
    Bagirov, Vugar A.
    Abilov, Ahmed I.
    Reyer, Henry
    Wimmers, Klaus
    Brem, Gottfried
    Zinovieva, Natalia A.
    JOURNAL OF ANIMAL SCIENCE, 2021, 99 : 245 - 245
  • [37] Genome-wide analysis of histone modifications in human pancreatic islets
    Bhandare, Reena
    Schug, Jonathan
    Le Lay, John
    Fox, Alan
    Smirnova, Olga
    Liu, Chengyang
    Naji, Ali
    Kaestner, Klaus H.
    GENOME RESEARCH, 2010, 20 (04) : 428 - 433
  • [38] Genome-wide DNA methylation analysis: no evidence for stable hemimethylation in the sheep muscle genome
    Couldrey, C.
    Brauning, R.
    Henderson, H. V.
    McEwan, J. C.
    ANIMAL GENETICS, 2015, 46 (02) : 185 - 189
  • [39] SNP markers-based map construction and genome-wide linkage analysis in Brassica napus
    Raman, Harsh
    Dalton-Morgan, Jessica
    Diffey, Simon
    Raman, Rosy
    Alamery, Salman
    Edwards, David
    Batley, Jacqueline
    PLANT BIOTECHNOLOGY JOURNAL, 2014, 12 (07) : 851 - 860
  • [40] A genome-wide scan statistic framework for whole-genome sequence data analysis
    He, Zihuai
    Xu, Bin
    Buxbaum, Joseph
    Ionita-Laza, Iuliana
    NATURE COMMUNICATIONS, 2019, 10 (1)