Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi

被引:109
作者
Berna, Luisa [1 ]
Rodriguez, Matias [2 ]
Laura Chiribao, Maria [1 ,3 ]
Parodi-Talice, Adriana [1 ,4 ]
Pita, Sebastian [1 ,4 ]
Rijo, Gaston [1 ]
Alvarez-Valin, Fernando [2 ]
Robello, Carlos [1 ,3 ]
机构
[1] Inst Pasteur Montevideo, Lab Host Pathogen Interact UBM, Montevideo, Uruguay
[2] Fac Ciencias UDELAR, Secc Biomatemat, Unidad Genom Evolut, Montevideo, Uruguay
[3] Fac Med UDELAR, Dept Bioquim, Montevideo, Uruguay
[4] Fac Ciencias UDELAR, Secc Genet, Montevideo, Uruguay
关键词
Trypanosoma cruzi; PacBio; whole genome sequencing; Chagas disease; MOLECULAR CHARACTERIZATION; TRANS-SIALIDASE; SATELLITE DNA; MUCIN GENES; PROTEIN; ORGANIZATION; PERFORMANCE; DIVERSITY; ALIGNMENT; PARASITE;
D O I
10.1099/mgen.0.000177
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi's genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (Id), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a 'core compartment' and a 'disruptive compartment' which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes.
引用
收藏
页数:19
相关论文
共 83 条
[1]   The mucin-like glycoprotein super-family of Trypanosoma cruzi:: structure and biological roles [J].
Acosta-Serrano, A ;
Almeida, IC ;
Freitas, LH ;
Yoshida, N ;
Schenkman, S .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 2001, 114 (02) :143-150
[2]   IDENTIFICATION OF A TRYPANOSOMA-CRUZI ANTIGEN THAT IS SHED DURING THE ACUTE PHASE OF CHAGAS-DISEASE [J].
AFFRANCHINO, JL ;
IBANEZ, CF ;
LUQUETTI, AO ;
RASSI, A ;
REYES, MB ;
MACINA, RA ;
ASLUND, L ;
PETTERSSON, U ;
FRASCH, ACC .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 1989, 34 (03) :221-228
[3]   The calmodulin-ubiquitin (CUB) genes of Trypanosoma cruzi are essential for parasite viability [J].
Ajioka, J ;
Swindle, J .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 1996, 78 (1-2) :217-225
[4]   MOLECULAR CHARACTERIZATION AND OVEREXPRESSION OF THE HYPOXANTHINE-GUANINE PHOSPHORIBOSYLTRANSFERASE GENE FROM TRYPANOSOMA-CRUZI [J].
ALLEN, TE ;
ULLMAN, B .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 1994, 65 (02) :233-245
[5]  
[Anonymous], 2017, IPA SCRIPT IMPROVE L
[6]  
[Anonymous], 2004, R LANG ENV STAT COMP
[7]   Database of Trypanosoma cruzi repeated genes:: 20000 additional gene variants [J].
Arner, Erik ;
Kindlund, Ellen ;
Nilsson, Daniel ;
Farzana, Fatima ;
Ferella, Marcela ;
Tammi, Martti T. ;
Andersson, Bjoern .
BMC GENOMICS, 2007, 8 (1)
[8]   Molecular cloning and characterization of the DNA mismatch repair gene class 2 from the Trypanosoma cruzi [J].
Augusto-Pinto, L ;
Bartholomeu, DC ;
Teixeira, SMR ;
Pena, SDJ ;
Machado, CR .
GENE, 2001, 272 (1-2) :323-333
[9]   Genomic organization and expression profile of the mucin-associated surface protein (masp) family of the human pathogen Trypanosoma cruzi [J].
Bartholomeu, Daniella C. ;
Cerqueira, Gustavo C. ;
Leao, Ana Carolina A. ;
daRocha, Wanderson D. ;
Pais, Fabiano S. ;
Macedo, Camila ;
Djikeng, Appolinaire ;
Teixeira, Santuza M. R. ;
El-Sayed, Najib M. .
NUCLEIC ACIDS RESEARCH, 2009, 37 (10) :3407-3417
[10]  
BASOMBRIO MA, 1982, INFECT IMMUN, V36, P342