Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions

被引:8
作者
Zhou, Lizhi [1 ]
Yu, Hai [1 ]
Wang, Kaihang [2 ]
Chen, Tingting [2 ]
Ma, Yue [2 ]
Huang, Yang [2 ]
Li, Jiajia [2 ]
Liu, Liqin [2 ]
Li, Yuqian [2 ]
Kong, Zhibo [2 ]
Zheng, Qingbing [1 ]
Wang, Yingbin [1 ]
Gu, Ying [1 ,2 ]
Xia, Ningshao [1 ,2 ]
Li, Shaowei [1 ,2 ]
机构
[1] Xiamen Univ, Sch Publ Hlth, State Key Lab Mol Vaccinol & Mol Diagnost, Xiamen 361102, Fujian, Peoples R China
[2] Xiamen Univ, Sch Life Sci, Natl Inst Diagnost & Vaccine Dev Infect Dis, Xiamen 361102, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Escherichia coli ER2566; Genome reannotation; Transcriptome sequencing; Engineer bacteria; ANNOTATION; ALGORITHM; DNA;
D O I
10.1186/s12864-020-06818-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundThe Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapted under external pressure.ResultsThe reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3 non-coding region) positioned 19-bp away from the lacI gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing.Conclusion p id=Par The ER2566 strain was used by both the general scientific community and the biotechnology industry. Reannotation of the E. coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor. We proposed that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Complete Genome Sequence of the Engineered Escherichia coli SHuffle Strains and Their Wild-Type Parents
    Anton, Brian P.
    Fomenkov, Alexey
    Raleigh, Elisabeth A.
    Berkmen, Mehmet
    [J]. GENOME ANNOUNCEMENTS, 2016, 4 (02)
  • [2] Reannotation of Genomes by Means of Proteomics Data
    Armengaud, J.
    [J]. PROTEOMICS IN BIOLOGY, PT A, 2017, 585 : 201 - 216
  • [3] ABACAS: algorithm-based automatic contiguation of assembled sequences
    Assefa, Samuel
    Keane, Thomas M.
    Otto, Thomas D.
    Newbold, Chris
    Berriman, Matthew
    [J]. BIOINFORMATICS, 2009, 25 (15) : 1968 - 1969
  • [4] The RAST server: Rapid annotations using subsystems technology
    Aziz, Ramy K.
    Bartels, Daniela
    Best, Aaron A.
    DeJongh, Matthew
    Disz, Terrence
    Edwards, Robert A.
    Formsma, Kevin
    Gerdes, Svetlana
    Glass, Elizabeth M.
    Kubal, Michael
    Meyer, Folker
    Olsen, Gary J.
    Olson, Robert
    Osterman, Andrei L.
    Overbeek, Ross A.
    McNeil, Leslie K.
    Paarmann, Daniel
    Paczian, Tobias
    Parrello, Bruce
    Pusch, Gordon D.
    Reich, Claudia
    Stevens, Rick
    Vassieva, Olga
    Vonstein, Veronika
    Wilke, Andreas
    Zagnitko, Olga
    [J]. BMC GENOMICS, 2008, 9 (1)
  • [5] A rare 920-kilobase chromosomal inversion mediated by IS1 transposition causes constitutive expression of the yiaK-S operon for carbohydrate utilization in Escherichia coli
    Badía, J
    Ibáñez, E
    Sabaté, M
    Baldomà, L
    Aguilar, J
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 1998, 273 (14) : 8376 - 8381
  • [6] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [7] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [8] CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats
    Bland, Charles
    Ramsey, Teresa L.
    Sabree, Fareedah
    Lowe, Micheal
    Brown, Kyndall
    Kyrpides, Nikos C.
    Hugenholtz, Philip
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [9] The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments
    Brouard, Jean-Simon
    Schenkel, Flavio
    Marete, Andrew
    Bissonnette, Nathalie
    [J]. JOURNAL OF ANIMAL SCIENCE AND BIOTECHNOLOGY, 2019, 10 (1)
  • [10] FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool
    Brown, Joseph
    Pirrung, Meg
    McCue, Lee Ann
    [J]. BIOINFORMATICS, 2017, 33 (19) : 3137 - 3139