Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen

被引:21
作者
Badet, Thomas [1 ]
Fouche, Simone [1 ,2 ]
Hartmann, Fanny E. [3 ]
Zala, Marcello [2 ]
Croll, Daniel [1 ]
机构
[1] Univ Neuchatel, Inst Biol, Lab Evolutionary Genet, Neuchatel, Switzerland
[2] Swiss Fed Inst Technol, Inst Integrat Biol, Plant Pathol, Zurich, Switzerland
[3] Univ Paris Saclay, Univ Paris Sud, CNRS, AgroParisTech,Ecol Systemat Evolut, Batiment 360, Orsay, France
基金
瑞士国家科学基金会;
关键词
ZYMOSEPTORIA-TRITICI; SENSITIVITY REVEALS; EVOLUTION; RECOMBINATION; MECHANISMS; INSIGHTS; IMPACT; GENES; MELANIZATION; CHROMOSOME;
D O I
10.1038/s41467-021-23862-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.
引用
收藏
页数:14
相关论文
共 81 条
[1]   Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms [J].
Abyzov, Alexej ;
Li, Shantao ;
Kim, Daniel Rhee ;
Mohiyuddin, Marghoob ;
Stuetz, Adrian M. ;
Parrish, Nicholas F. ;
Mu, Xinmeng Jasmine ;
Clark, Wyatt ;
Chen, Ken ;
Hurles, Matthew ;
Korbel, Jan O. ;
Lam, Hugo Y. K. ;
Lee, Charles ;
Gerstein, Mark B. .
NATURE COMMUNICATIONS, 2015, 6
[2]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping [J].
Alkan, Can ;
Coe, Bradley P. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2011, 12 (05) :363-375
[3]  
Alloghani M., 2020, SUPERVISED UNSUPERVI, P3, DOI [10.1007/978-3-030-22475-2_1, DOI 10.1007/978-3-030-22475-2_1, 10.1007/978-3-030-22475-21]
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   Double-strand breaks associated with repetitive DNA can reshape the genome [J].
Argueso, Juan Lucas ;
Westmoreland, James ;
Mieczkowski, Piotr A. ;
Gawel, Malgorzata ;
Petes, Thomas D. ;
Resnick, Michael A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (33) :11845-11850
[6]  
Badet T., 2021, NAT COMMUN, DOI [10.5281/ZENODO.4725688, DOI 10.5281/ZENODO.4725688]
[7]   A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici [J].
Badet, Thomas ;
Oggenfuss, Ursula ;
Abraham, Leen ;
McDonald, Bruce A. ;
Croll, Daniel .
BMC BIOLOGY, 2020, 18 (01)
[8]   Repbase Update, a database of repetitive elements in eukaryotic genomes [J].
Bao, Weidong ;
Kojima, Kenji K. ;
Kohany, Oleksiy .
MOBILE DNA, 2015, 6
[9]   BamTools: a C++ API and toolkit for analyzing and managing BAM files [J].
Barnett, Derek W. ;
Garrison, Erik K. ;
Quinlan, Aaron R. ;
Stroemberg, Michael P. ;
Marth, Gabor T. .
BIOINFORMATICS, 2011, 27 (12) :1691-1692
[10]   TASSEL: software for association mapping of complex traits in diverse samples [J].
Bradbury, Peter J. ;
Zhang, Zhiwu ;
Kroon, Dallas E. ;
Casstevens, Terry M. ;
Ramdoss, Yogesh ;
Buckler, Edward S. .
BIOINFORMATICS, 2007, 23 (19) :2633-2635