Full-length transcriptome sequencing on PacBio platform

被引:0
|
作者
Ren Y. [1 ]
Zhang J. [1 ]
Sun Y. [1 ]
Wu Z. [2 ]
Ruan J. [2 ]
He B. [1 ]
Liu G. [1 ]
Gao S. [1 ]
Bu W. [1 ]
机构
[1] College of Life Sciences, Nankai University, Tianjin
[2] College of Mathematics, Nankai University, Tianjin
来源
Gao, Shan (gao_shan@mail.nankai.edu.cn) | 1600年 / Chinese Academy of Sciences卷 / 61期
关键词
Full-length transcriptome; PacBio; Quality control; Single molecule sequencing; Standard protocol;
D O I
10.1360/N972015-01384
中图分类号
学科分类号
摘要
The Next Generation Sequencing (NGS) technology, particularly the Illumina platform now has produced most of the animal and plant transcriptomes, but the short reads from NGS sequencers result in incompletely assembled transcripts which are lack of some important information (e.g. alternative splicing). This limits better understanding of transcriptome data. Based on the single-molecule real-time (SMRT) sequencing technology, the PacBio platform can provide longer and even full-length transcripts that originate from observations of single molecules without assembly. The full-length transcripts can be used to investigate alternative splicing, alternative polyadenylation, novel genes, non-coding RNAs and fusion transcripts, et al. Until the end of 2015, transcriptomes of a few species have been sequenced using the PacBio platform. They are classfied into three groups. The first group includes human lymphoblastoid and Salvia miltiorrhiza using a combination of NGS short reads and SMRT technology. The second group includes HIV-1, bovine immunoglobulin G, human embryonic stem cells, mouse neurexins and Propithecus coquereli using SMRT. The third group includes european cuttlefish, tetraploid cotton and fungi using SMRT with the latest PacBio full-length transcriptome data analysis pipeline IsoSeq. The use of SMARTer PCR cDNA Synthesis Kit and the IsoSeq data analysis pipeline was recommended to facilitate full-length transcriptome sequencing. However, the transcriptome data quality could be affected by ribosomal RNA contamination, cross-contamination on agarose gel, the effect of size selection using gel or BluePippin, prevalence of PCR chimera products and the wrong removal of SMRT bell adapters. Although IsoSeq can remove artificial concatemers that are produced due to insufficient SMRT bell amount during the sequencing library preparation step, some problems still exists. For example, IsoSeq can not distinguish PCR chimeras from true fusion genes. Another critical problem is the misidentification of 5' and 3' primers due to sequencing errors or partial trimming of them as the SMRT bell adapters. This could provide the wrong strand information of transcripts for further analysis. In addtion, transcripts of the same gene are difficult to be clustered without the genome guide. Therefore, it is necessary to standardize the experiment and data analysis protocols and design quality control measures of the full-length transcriptome sequencing technology for its application in a large scale. In this study, we sequenced the first full-length insect transcriptome using the Erthesina fullo Thunberg as material. Seven SMRT cells on PacBio RS II sequencer were used to produce 381394 reads with 16262 bp average size. Totally 6 Gbp effective data was used for further analysis on the optimization of experimental parameters, design of quality control measures and standardization of protocols using the new PacBio reagents (P6/C4). Some of results in this study were reported to provide useful information to help better understanding the full-length transcriptome sequencing technology and designing experiments. © 2016, Science Press. All right reserved.
引用
收藏
页码:1250 / 1254
页数:4
相关论文
共 13 条
  • [1] Gao S., Ou J.H., Xiao K., Using R and Bioconductor in Bioinformatics, pp. 33-34, (2014)
  • [2] Hagen T., Fabian G., Donald S., Et al., Defining a personal, allelespecific, and singlemolecule longread transcriptome, Proc Natl Acad Sci USA, 111, pp. 9869-9874, (2014)
  • [3] Xu Z., Peters R.J., Weirather J., Et al., Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of salvia miltiorrhiza and tanshinone biosynthesis, Plant J, 82, pp. 951-961, (2015)
  • [4] Kin F.A., Vittorio S., Pegah T.A., Et al., Characterization of the human ESC transcriptome by hybrid sequencing, Proc Natl Acad Sci USA, 110, pp. E4821-E4830, (2013)
  • [5] Sharon D., Tilgner H., Grubert F., Et al., A single-molecule long-read survey of the human transcriptome, Nat Biotechnol, 31, pp. 1009-1014, (2013)
  • [6] Ocwieja K.E., Sherrill-Mix S., Mukherjee R., Et al., Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing, Nucleic Acids Res, 40, pp. 10345-10355, (2012)
  • [7] Larsen P.A., Smith T.P., Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire, BMC Immunol, 13, (2012)
  • [8] Barbara T., Ozgun G., Stephen R.Q., Et al., Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing, Proc Natl Acad Sci USA, 111, pp. E1291-E1299, (2014)
  • [9] Larsen P.A., Campbell C.R., Yoder A.D., Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates, Mol Ecol Resour, 14, pp. 1198-1209, (2014)
  • [10] Sean P.G., Elizabeth T., Asaf S., Et al., Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing, PLoS One, 10, (2015)