Cotton pan-genome retrieves the lost sequences and genes during domestication and selection

被引:106
作者
Li, Jianying [1 ]
Yuan, Daojun [2 ]
Wang, Pengcheng [1 ]
Wang, Qiongqiong [1 ]
Sun, Mengling [1 ]
Liu, Zhenping [1 ]
Si, Huan [1 ]
Xu, Zhongping [1 ]
Ma, Yizan [1 ]
Zhang, Boyang [1 ]
Pei, Liuling [1 ]
Tu, Lili [1 ]
Zhu, Longfu [1 ]
Chen, Ling-Ling [3 ]
Lindsey, Keith [4 ]
Zhang, Xianlong [1 ]
Jin, Shuangxia [1 ]
Wang, Maojun [1 ]
机构
[1] Huazhong Agr Univ, Natl Key Lab Crop Genet Improvement, Wuhan, Peoples R China
[2] Huazhong Agr Univ, Coll Plant Sci & Technol, Wuhan, Peoples R China
[3] Huazhong Agr Univ, Coll Informat, Hubei Key Lab Agr Bioinformat, Wuhan, Peoples R China
[4] Univ Durham, Dept Biosci, Durham, England
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Cotton; Domestication; Improvement; Pan-genome; Copy number variation (CNV); Presence; absence variation (PAV); Gene loss; POPULATION-STRUCTURE; FIBER QUALITY; ASSOCIATION; DIVERSITY; INSIGHTS; REVEAL; RICE; WILD; DIFFERENTIATION; DIVERGENCE;
D O I
10.1186/s13059-021-02351-w
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Millennia of directional human selection has reshaped the genomic architecture of cultivated cotton relative to wild counterparts, but we have limited understanding of the selective retention and fractionation of genomic components. Results We construct a comprehensive genomic variome based on 1961 cottons and identify 456 Mb and 357 Mb of sequence with domestication and improvement selection signals and 162 loci, 84 of which are novel, including 47 loci associated with 16 agronomic traits. Using pan-genome analyses, we identify 32,569 and 8851 non-reference genes lost from Gossypium hirsutum and Gossypium barbadense reference genomes respectively, of which 38.2% (39,278) and 14.2% (11,359) of genes exhibit presence/absence variation (PAV). We document the landscape of PAV selection accompanied by asymmetric gene gain and loss and identify 124 PAVs linked to favorable fiber quality and yield loci. Conclusions This variation repertoire points to genomic divergence during cotton domestication and improvement, which informs the characterization of favorable gene alleles for improved breeding practice using a pan-genome-based approach.
引用
收藏
页数:26
相关论文
共 93 条
[1]   Modifications to a LATE MERISTEM IDENTITY1 gene are responsible for the major leaf shapes of Upland cotton (Gossypium hirsutum L.) [J].
Andres, Ryan J. ;
Coneva, Viktoriya ;
Frank, Margaret H. ;
Tuttle, John R. ;
Samayoa, Luis Fernando ;
Han, Sang-Won ;
Kaur, Baljinder ;
Zhu, Linglong ;
Fang, Hui ;
Bowman, Daryl T. ;
Rojas-Pierce, Marcela ;
Haigler, Candace H. ;
Jones, Don C. ;
Holland, James B. ;
Chitwood, Daniel H. ;
Kuraparthy, Vasu .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (01) :E57-E66
[2]   Parallel up-regulation of the profilin gene family following independent domestication of diploid and allopolyploid cotton (Gossypium) [J].
Bao, Ying ;
Hu, Guanjing ;
Flagel, Lex E. ;
Salmon, Armel ;
Bezanilla, Magdalena ;
Paterson, Andrew H. ;
Wang, Zining ;
Wendel, Jonathan F. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (52) :21152-21157
[3]   Plant pan-genomes are the new reference [J].
Bayer, Philipp E. ;
Golicz, Agnieszka A. ;
Scheben, Armin ;
Batley, Jacqueline ;
Edwards, David .
NATURE PLANTS, 2020, 6 (08) :914-920
[4]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[5]   TASSEL: software for association mapping of complex traits in diverse samples [J].
Bradbury, Peter J. ;
Zhang, Zhiwu ;
Kroon, Dallas E. ;
Casstevens, Terry M. ;
Ramdoss, Yogesh ;
Buckler, Edward S. .
BIOINFORMATICS, 2007, 23 (19) :2633-2635
[6]   A One-Penny Imputed Genome from Next-Generation Reference Panels [J].
Browning, Brian L. ;
Zhou, Ying ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 103 (03) :338-348
[7]   Population differentiation as a test for selective sweeps [J].
Chen, Hua ;
Patterson, Nick ;
Reich, David .
GENOME RESEARCH, 2010, 20 (03) :393-402
[8]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[9]  
Chen Nansheng, 2004, Curr Protoc Bioinformatics, VChapter 4, DOI 10.1002/0471250953.bi0410s05
[10]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158