The simple fool's guide to population genomics via RNA-Seq: an introduction to high-throughput sequencing data analysis

被引:189
作者
De Wit, Pierre [1 ]
Pespeni, Melissa H. [1 ,2 ]
Ladner, Jason T. [1 ]
Barshis, Daniel J. [1 ]
Seneca, Francois [1 ]
Jaris, Hannah [1 ]
Therkildsen, Nina Overgaard [3 ]
Morikawa, Megan [4 ]
Palumbi, Stephen R. [1 ]
机构
[1] Stanford Univ, Hopkins Marine Stn, Dept Biol, Pacific Grove, CA 93950 USA
[2] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[3] Tech Univ Denmark, Natl Inst Aquat Resources, DK-8600 Silkeborg, Denmark
[4] Duke Univ, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
bioinformatics; de novo assembly; gene expression; population genomics; RNA-Seq; SNP detection; TRANSCRIPTOME; DISCOVERY; TOOL; FRAMEWORK; TECHNOLOGIES; EVOLUTION; ALIGNMENT; READS;
D O I
10.1111/1755-0998.12003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing technologies are currently revolutionizing the field of biology and medicine, yet bioinformatic challenges in analysing very large data sets have slowed the adoption of these technologies by the community of population biologists. We introduce the Simple Fool's Guide to Population Genomics via RNA-seq (SFG), a document intended to serve as an easy-to-follow protocol, walking a user through one example of high-throughput sequencing data analysis of nonmodel organisms. It is by no means an exhaustive protocol, but rather serves as an introduction to the bioinformatic methods used in population genomics, enabling a user to gain familiarity with basic analysis steps. The SFG consists of two parts. This document summarizes the steps needed and lays out the basic themes for each and a simple approach to follow. The second document is the full SFG, publicly available at http://sfg.stanford.edu, that includes detailed protocols for data processing and analysis, along with a repository of custom-made scripts and sample files. Steps included in the SFG range from tissue collection to de novo assembly, blast annotation, alignment, gene expression, functional enrichment, SNP detection, principal components and FST outlier analyses. Although the technical aspects of population genomics are changing very quickly, our hope is that this document will help population biologists with little to no background in high-throughput sequencing and bioinformatics to more quickly adopt these new techniques.
引用
收藏
页码:1058 / 1067
页数:10
相关论文
共 64 条
[1]  
Anders S., 2010, MOL BIOL+, P1, DOI DOI 10.1186/GB-2010-11-10-R106
[2]   Multiplexed shotgun genotyping for rapid and efficient genetic mapping [J].
Andolfatto, Peter ;
Davison, Dan ;
Erezyilmaz, Deniz ;
Hu, Tina T. ;
Mast, Joshua ;
Sunayama-Morita, Tomoko ;
Stern, David L. .
GENOME RESEARCH, 2011, 21 (04) :610-617
[3]  
[Anonymous], PRACTICAL COMPUTING
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[6]   De novo genome assembly: what every biologist should know [J].
Baker, Monya .
NATURE METHODS, 2012, 9 (04) :333-337
[7]   Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection [J].
Barakat, Abdelali ;
DiLoreto, Denis S. ;
Zhang, Yi ;
Smith, Chris ;
Baier, Kathleen ;
Powell, William A. ;
Wheeler, Nicholas ;
Sederoff, Ron ;
Carlson, John E. .
BMC PLANT BIOLOGY, 2009, 9
[8]   Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration [J].
Bauer, Sebastian ;
Grossmann, Steffen ;
Vingron, Martin ;
Robinson, Peter N. .
BIOINFORMATICS, 2008, 24 (14) :1650-1651
[9]   GOing Bayesian: model-based gene set analysis of genome-scale data [J].
Bauer, Sebastian ;
Gagneur, Julien ;
Robinson, Peter N. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (11) :3523-3532
[10]   Linkage Mapping and Comparative Genomics Using Next-Generation RAD Sequencing of a Non-Model Organism [J].
Baxter, Simon W. ;
Davey, John W. ;
Johnston, J. Spencer ;
Shelton, Anthony M. ;
Heckel, David G. ;
Jiggins, Chris D. ;
Blaxter, Mark L. .
PLOS ONE, 2011, 6 (04)