Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies

被引:20
作者
Beauclair, Linda [1 ]
Rame, Christelle [1 ]
Arensburger, Peter [2 ]
Piegu, Benoit [1 ]
Guillou, Florian [1 ]
Dupont, Joelle [1 ]
Bigot, Yves [1 ]
机构
[1] Ctr INRA Val Loire, CNRS 7247, UMR INRA0085, PRC, F-37380 Nouzilly, France
[2] Calif State Polytech Univ Pomona, Dept Biol Sci, Pomona, CA 91768 USA
关键词
G-quadruplex; genome; Illumina; PacBio; repeats; G-QUADRUPLEX STRUCTURES; CHICKEN GENOME; HIDDEN GENES; IDENTIFICATION; EXPRESSION; INSIGHT; PROVIDE; TRAITS;
D O I
10.1186/s12864-019-6131-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced. Results The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNF alpha, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected. Conclusions High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.
引用
收藏
页数:16
相关论文
共 52 条
[41]   Characterizing and measuring bias in sequence data [J].
Ross, Michael G. ;
Russ, Carsten ;
Costello, Maura ;
Hollinger, Andrew ;
Lennon, Niall J. ;
Hegarty, Ryan ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME BIOLOGY, 2013, 14 (05)
[42]   G-quadruplex structures within the 3′ UTR of LINE-1 elements stimulate retrotransposition [J].
Sahakyan, Aleksandr B. ;
Murat, Pierre ;
Mayer, Clemens ;
Balasubramanian, Shankar .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2017, 24 (03) :243-+
[43]   Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution [J].
Salse, Jerome ;
Bolot, Stephanie ;
Throude, Michael ;
Jouffe, Vincent ;
Piegu, Benoit ;
Quraishi, Umar Masood ;
Calcagno, Thomas ;
Cooke, Richard ;
Delseny, Michel ;
Feuillet, Catherine .
PLANT CELL, 2008, 20 (01) :11-24
[44]   Mapping of leptin and its syntenic genes to chicken chromosome 1p [J].
Seroussi, Eyal ;
Pitel, Frederique ;
Leroux, Sophie ;
Morisson, Mireille ;
Bornelov, Susanne ;
Miyara, Shoval ;
Yosefi, Sara ;
Cogburn, Larry A. ;
Burt, David W. ;
Anderson, Leif ;
Friedman-Einat, Miriam .
BMC GENETICS, 2017, 18
[45]   Identification of the Long-Sought Leptin in Chicken and Duck: Expression Pattern of the Highly GC-Rich Avian leptin Fits an Autocrine/Paracrine Rather Than Endocrine Function [J].
Seroussi, Eyal ;
Cinnamon, Yuval ;
Yosefi, Sara ;
Genin, Olga ;
Smith, Julia Gage ;
Rafati, Nima ;
Borneloev, Susanne ;
Andersson, Leif ;
Friedman-Einat, Miriam .
ENDOCRINOLOGY, 2016, 157 (02) :737-751
[46]   Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes [J].
Shin, Seung Chul ;
Ahn, Do Hwan ;
Kim, Su Jin ;
Lee, Hyoungseok ;
Oh, Tae-Jin ;
Lee, Jong Eun ;
Park, Hyun .
PLOS ONE, 2013, 8 (07)
[47]   Ribosomal RNA gene copy number and nucleolar-size polymorphisms within and among chicken lines selected for enhanced growth [J].
Su, MH ;
Delany, ME .
POULTRY SCIENCE, 1998, 77 (12) :1748-1754
[48]   PacBio But Not Illumina Technology Can Achieve Fast, Accurate and Complete Closure of the High GC, Complex Burkholderia pseudomallei Two-Chromosome Genome [J].
Teng, Jade L. L. ;
Yeung, Man Lung ;
Chan, Elaine ;
Jia, Lilong ;
Lin, Chi Ho ;
Huang, Yi ;
Tse, Herman ;
Wong, Samson S. Y. ;
Sham, Pak Chung ;
Lau, Susanna K. P. ;
Woo, Patrick C. Y. .
FRONTIERS IN MICROBIOLOGY, 2017, 8
[49]   Illumina Library Preparation for Sequencing the GC-Rich Fraction of Heterogeneous Genomic DNA [J].
Tilak, Marie-Ka ;
Botero-Castro, Fidel ;
Galtier, Nicolas ;
Nabholz, Benoit .
GENOME BIOLOGY AND EVOLUTION, 2018, 10 (02) :616-622
[50]   A New Chicken Genome Assembly Provides Insight into Avian Genome Structure [J].
Warren, Wesley C. ;
Hillier, LaDeana W. ;
Tomlinson, Chad ;
Minx, Patrick ;
Kremitzki, Milinn ;
Graves, Tina ;
Markovic, Chris ;
Bouk, Nathan ;
Pruitt, Kim D. ;
Thibaud-Nissen, Francoise ;
Schneider, Valerie ;
Mansour, Tamer A. ;
Brown, C. Titus ;
Zimin, Aleksey ;
Hawken, Rachel ;
Abrahamsen, Mitch ;
Pyrkosz, Alexis B. ;
Morisson, Mireille ;
Fillon, Valerie ;
Vignal, Alain ;
Chow, William ;
Howe, Kerstin ;
Fulton, Janet E. ;
Miller, Marcia M. ;
Lovell, Peter ;
Mello, Claudio V. ;
Wirthlin, Morgan ;
Mason, Andrew S. ;
Kuo, Richard ;
Burt, David W. ;
Dodgson, Jerry B. ;
Cheng, Hans H. .
G3-GENES GENOMES GENETICS, 2017, 7 (01) :109-117