Probabilistic error correction for RNA sequencing

被引:49
|
作者
Le, Hai-Son [1 ]
Schulz, Marcel H. [2 ]
McCauley, Brenna M. [3 ]
Hinman, Veronica F. [3 ]
Bar-Joseph, Ziv [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA 15217 USA
[2] Carnegie Mellon Univ, Lane Ctr Computat Biol, Pittsburgh, PA 15217 USA
[3] Carnegie Mellon Univ, Dept Biol Sci, Pittsburgh, PA 15217 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
SEQ DATA; REGULATORY NETWORK; GENE-EXPRESSION; EFFICIENT; NORMALIZATION; ALIGNMENT; READS;
D O I
10.1093/nar/gkt215
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] On the Probabilistic Quantum Error Correction
    Kukulski, Ryszard
    Pawela, Lukasz
    Puchala, Zbigniew
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (07) : 4620 - 4640
  • [2] Probabilistic crisscross error correction
    Roth, RM
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1997, 43 (05) : 1425 - 1438
  • [3] Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls
    Tong, Li
    Yang, Cheng
    Wu, Po-Yen
    Wang, May D.
    2016 3RD IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, 2016, : 74 - 77
  • [4] Reversing measurement and probabilistic quantum error correction
    Koashi, M
    Ueda, M
    PHYSICAL REVIEW LETTERS, 1999, 82 (12) : 2598 - 2601
  • [5] Probabilistic model based error correction in a set of various mutant sequences analyzed by next-generation sequencing
    Aita, Takuyo
    Ichihashi, Norikazu
    Yomo, Tetsuya
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2013, 47 : 221 - 230
  • [6] Sequencing error correction without a reference genome
    Julie A Sleep
    Andreas W Schreiber
    Ute Baumann
    BMC Bioinformatics, 14
  • [7] 454 antibody sequencing - Error characterization and correction
    Prabakaran P.
    Streaker E.
    Chen W.
    Dimitrov D.S.
    BMC Research Notes, 4 (1)
  • [8] Sequencing error correction without a reference genome
    Sleep, Julie A.
    Schreiber, Andreas W.
    Baumann, Ute
    BMC BIOINFORMATICS, 2013, 14
  • [9] Quantifying sequencing error and effective sequencing depth of liquid biopsy NGS with UMI error correction
    Frank, Malene Stochkel
    Fuss, Janina
    Steiert, Tim Alexander
    Streleckiene, Greta
    Gehl, Julie
    Forster, Michael
    BIOTECHNIQUES, 2021, 70 (04) : 226 - 232
  • [10] Jabba: hybrid error correction for long sequencing reads
    Giles Miclotte
    Mahdi Heydari
    Piet Demeester
    Stephane Rombauts
    Yves Van de Peer
    Pieter Audenaert
    Jan Fostier
    Algorithms for Molecular Biology, 11