ARG-based genome-wide analysis of cacao cultivars

被引:4
作者
Utro, Filippo [1 ]
Cornejo, Omar Eduardo [2 ]
Livingstone, Donald [3 ]
Motamayor, Juan Carlos [4 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Computat Biol Ctr, Yorktown Hts, NY 10598 USA
[2] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
[3] USDA, Miami, FL 33186 USA
[4] Mars Inc, Miami, FL 33158 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
ANCESTRAL RECOMBINATIONS GRAPH; GENETIC DIVERSITY; NETWORKS; PATTERNS; MARKERS;
D O I
10.1186/1471-2105-13-S19-S17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. Results: While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. Conclusions: We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Genome-Wide Variation Analysis of Four Vegetable Soybean Cultivars Based on Re-Sequencing
    Yu, Xiaomin
    Fu, Xujun
    Yang, Qinghua
    Jin, Hangxia
    Zhu, Longming
    Yuan, Fengjie
    PLANTS-BASEL, 2022, 11 (01):
  • [2] Methylation-sensitive amplified polymorphism-based genome-wide analysis of cytosine methylation profiles in Nicotiana tabacum cultivars
    Jiao, J.
    Wu, J.
    Lv, Z.
    Sun, C.
    Gao, L.
    Yan, X.
    Cui, L.
    Tang, Z.
    Yan, B.
    Jia, Y.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (04) : 15177 - 15187
  • [3] The Phylogenetic Relationships of the Family Sciaenidae Based on Genome-Wide Data Analysis
    Han, Xiaolu
    Jin, Shihuai
    Han, Zhiqiang
    Gao, Tianxiang
    ANIMALS, 2022, 12 (23):
  • [4] Advances in genome-wide DNA methylation analysis
    Gupta, Romi
    Nagarajan, Arvindhan
    Wajapeyee, Narendra
    BIOTECHNIQUES, 2010, 49 (04) : III - XI
  • [5] A Novel Statistic for Genome-Wide Interaction Analysis
    Wu, Xuesen
    Dong, Hua
    Luo, Li
    Zhu, Yun
    Peng, Gang
    Reveille, John D.
    Xiong, Momiao
    PLOS GENETICS, 2010, 6 (09):
  • [6] Genome-wide mining and comparative analysis of microsatellites in three macaque species
    Liu, Sanxu
    Hou, Wei
    Sun, Tianlin
    Xu, Yongtao
    Li, Peng
    Yue, Bisong
    Fan, Zhenxin
    Li, Jing
    MOLECULAR GENETICS AND GENOMICS, 2017, 292 (03) : 537 - 550
  • [7] Genome-wide mapping and characterization of microsatellites in the swamp eel genome
    Li, Zhigang
    Chen, Feng
    Huang, Chunhua
    Zheng, Weixin
    Yu, Chunlai
    Cheng, Hanhua
    Zhou, Rongjia
    SCIENTIFIC REPORTS, 2017, 7
  • [8] Genome-Wide DNA Polymorphisms in Seven Rice Cultivars of Temperate and Tropical Japonica Groups
    Arai-Kichise, Yuko
    Shiwa, Yuh
    Ebana, Kaworu
    Shibata-Hatta, Mari
    Yoshikawa, Hirofumi
    Yano, Masahiro
    Wakasa, Kyo
    PLOS ONE, 2014, 9 (01):
  • [9] Identification of commercial cultivars of Agaricus bisporus in China using genome-wide microsatellite markers
    Wang Li-ning
    Gao Wei
    Wang Qiong-ying
    Qu Ji-bin
    Zhang Jin-xia
    Huang Chen-yang
    JOURNAL OF INTEGRATIVE AGRICULTURE, 2019, 18 (03) : 580 - 589
  • [10] Genome-wide analysis of retrogene polymorphisms in Drosophila melanogaster
    Schrider, Daniel R.
    Stevens, Kristian
    Cardeno, Charis M.
    Langley, Charles H.
    Hahn, Matthew W.
    GENOME RESEARCH, 2011, 21 (12) : 2087 - 2095