ARG-based genome-wide analysis of cacao cultivars

被引:4
作者
Utro, Filippo [1 ]
Cornejo, Omar Eduardo [2 ]
Livingstone, Donald [3 ]
Motamayor, Juan Carlos [4 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Computat Biol Ctr, Yorktown Hts, NY 10598 USA
[2] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
[3] USDA, Miami, FL 33186 USA
[4] Mars Inc, Miami, FL 33158 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
ANCESTRAL RECOMBINATIONS GRAPH; GENETIC DIVERSITY; NETWORKS; PATTERNS; MARKERS;
D O I
10.1186/1471-2105-13-S19-S17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. Results: While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. Conclusions: We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars
    Cavanagh, Colin R.
    Chao, Shiaoman
    Wang, Shichen
    Huang, Bevan Emma
    Stephen, Stuart
    Kiani, Seifollah
    Forrest, Kerrie
    Saintenac, Cyrille
    Brown-Guedira, Gina L.
    Akhunova, Alina
    See, Deven
    Bai, Guihua
    Pumphrey, Michael
    Tomar, Luxmi
    Wong, Debbie
    Kong, Stephan
    Reynolds, Matthew
    da Silva, Marta Lopez
    Bockelman, Harold
    Talbert, Luther
    Anderson, James A.
    Dreisigacker, Susanne
    Baenziger, Stephen
    Carter, Arron
    Korzun, Viktor
    Morrell, Peter Laurent
    Dubcovsky, Jorge
    Morell, Matthew K.
    Sorrells, Mark E.
    Hayden, Matthew J.
    Akhunov, Eduard
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (20) : 8057 - 8062
  • [42] Genome-Wide Characterization of Insertion and Deletion Variation in Chicken Using Next Generation Sequencing
    Yan, Yiyuan
    Yi, Guoqiang
    Sun, Congjiao
    Qu, Lujiang
    Yang, Ning
    PLOS ONE, 2014, 9 (08):
  • [43] The genome-wide structure of the Jewish people
    Behar, Doron M.
    Yunusbayev, Bayazit
    Metspalu, Mait
    Metspalu, Ene
    Rosset, Saharon
    Parik, Jueri
    Rootsi, Siiri
    Chaubey, Gyaneshwer
    Kutuev, Ildus
    Yudkovsky, Guennady
    Khusnutdinova, Elza K.
    Balanovsky, Oleg
    Semino, Ornella
    Pereira, Luisa
    Comas, David
    Gurwitz, David
    Bonne-Tamir, Batsheva
    Parfitt, Tudor
    Hammer, Michael F.
    Skorecki, Karl
    Villems, Richard
    NATURE, 2010, 466 (7303) : 238 - U112
  • [44] Genome-wide Association: "A Revolutionary Approach"
    Gupta, Vipin
    Saraswathy, K. N.
    Khadgawat, Rajesh
    Sachdeva, M. P.
    INTERNATIONAL JOURNAL OF HUMAN GENETICS, 2009, 9 (02) : 97 - 103
  • [45] Genome-wide researches and applications on Dendrobium
    Zheng, Shi-gang
    Hu, Ya-dong
    Zhao, Ruo-xi
    Yan, Shou
    Zhang, Xue-qin
    Zhao, Ting-mei
    Chun, Ze
    PLANTA, 2018, 248 (04) : 769 - 784
  • [46] Genome-Wide Analysis of Fatty Acid Desaturases in Soybean (Glycine max)
    Chi, Xiaoyuan
    Yang, Qingli
    Lu, Yandu
    Wang, Jinyan
    Zhang, Qingfen
    Pan, Lijuan
    Chen, Mingna
    He, Yanan
    Yu, Shanlin
    PLANT MOLECULAR BIOLOGY REPORTER, 2011, 29 (04) : 769 - 783
  • [47] Genome-Wide Association Analysis of the Anthocyanin and Carotenoid Contents of Rose Petals
    Schulz, Dietmar F.
    Schott, Rena T.
    Voorrips, Roeland E.
    Smulders, Marinus J. M.
    Linde, Marcus
    Debener, Thomas
    Frontiers in Plant Science, 2016, 7
  • [48] Application of multivariate curve resolution to the analysis of yeast genome-wide screens
    Jaumot, Joaquim
    Pina, Benjamin
    Tauler, Roma
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 104 (01) : 53 - 64
  • [49] Genome-wide comparative analysis of flowering genes between Arabidopsis and mungbean
    Kim, Sue K.
    Lee, Taeyoung
    Kang, Yang Jae
    Hwang, Won Joo
    Kim, Kil Hyun
    Moon, Jung-Kyung
    Kim, Moon Young
    Lee, Suk-Ha
    GENES & GENOMICS, 2014, 36 (06) : 799 - 808
  • [50] Genome-wide analysis of alternative splicing evolution among Mus subspecies
    Harr, Bettina
    Turner, Leslie M.
    MOLECULAR ECOLOGY, 2010, 19 : 228 - 239