De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

被引:66
作者
Singhal, Sonal [1 ,2 ]
机构
[1] Univ Calif Berkeley, Museum Vertebrate Zool, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
annotation; de novo assembly; suture zones; transcriptomes; variant discovery; RNA-SEQ; REDUCED REPRESENTATION; STATISTICAL FRAMEWORK; READ ALIGNMENT; GENERATION; DISCOVERY; GENOME; PROTEIN; BLAST; ALGORITHMS;
D O I
10.1111/1755-0998.12077
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis from assembly to annotation to variant discovery researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.
引用
收藏
页码:403 / 416
页数:14
相关论文
共 72 条
  • [1] The genome of the green anole lizard and a comparative analysis with birds and mammals
    Alfoeldi, Jessica
    Di Palma, Federica
    Grabherr, Manfred
    Williams, Christina
    Kong, Lesheng
    Mauceli, Evan
    Russell, Pamela
    Lowe, Craig B.
    Glor, Richard E.
    Jaffe, Jacob D.
    Ray, David A.
    Boissinot, Stephane
    Shedlock, Andrew M.
    Botka, Christopher
    Castoe, Todd A.
    Colbourne, John K.
    Fujita, Matthew K.
    Moreno, Ricardo Godinez
    ten Hallers, Boudewijn F.
    Haussler, David
    Heger, Andreas
    Heiman, David
    Janes, Daniel E.
    Johnson, Jeremy
    de Jong, Pieter J.
    Koriabine, Maxim Y.
    Lara, Marcia
    Novick, Peter A.
    Organ, Chris L.
    Peach, Sally E.
    Poe, Steven
    Pollock, David D.
    de Queiroz, Kevin
    Sanger, Thomas
    Searle, Steve
    Smith, Jeremy D.
    Smith, Zachary
    Swofford, Ross
    Turner-Maier, Jason
    Wade, Juli
    Young, Sarah
    Zadissa, Amonida
    Edwards, Scott V.
    Glenn, Travis C.
    Schneider, Christopher J.
    Losos, Jonathan B.
    Lander, Eric S.
    Breen, Matthew
    Ponting, Chris P.
    Lindblad-Toh, Kerstin
    [J]. NATURE, 2011, 477 (7366) : 587 - 591
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Andrews S., 2012, FastQC
  • [4] De novo genome assembly: what every biologist should know
    Baker, Monya
    [J]. NATURE METHODS, 2012, 9 (04) : 333 - 337
  • [5] RETRACTED: Evaluation of next-generation sequencing software in mapping and assembly (Retracted article. See vol. 56, pg. 687, 2011)
    Bao, Suying
    Jiang, Rui
    Kwan, WingKeung
    Wang, BinBin
    Ma, Xu
    Song, You-Qiang
    [J]. JOURNAL OF HUMAN GENETICS, 2011, 56 (06) : 406 - 414
  • [6] Begun D, 2011, PLOS BIOL, V5, P310
  • [7] Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales
    Bi, Ke
    Vanderpool, Dan
    Singhal, Sonal
    Linderoth, Tyler
    Moritz, Craig
    Good, Jeffrey M.
    [J]. BMC GENOMICS, 2012, 13
  • [8] Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences
    Catchen, Julian M.
    Amores, Angel
    Hohenlohe, Paul
    Cresko, William
    Postlethwait, John H.
    [J]. G3-GENES GENOMES GENETICS, 2011, 1 (03): : 171 - 182
  • [9] Chen F., 2005, Nucleic Acids Research, V34, P363
  • [10] Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes
    Chen, Feng
    Mackey, Aaron J.
    Vermunt, Jeroen K.
    Roos, David S.
    [J]. PLOS ONE, 2007, 2 (04):