CoRAL: predicting non-coding RNAs from small RNA-sequencing data

被引:20
作者
Leung, Yuk Yee [1 ,2 ]
Ryvkin, Paul [2 ,3 ]
Ungar, Lyle H. [2 ,3 ,4 ]
Gregory, Brian D. [3 ,5 ,6 ]
Wang, Li-San [1 ,2 ,3 ,5 ,7 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Penn Ctr Bioinformat, Philadelphia, PA 19104 USA
[3] Univ Penn, Perelman Sch Med, Genom & Computat Biol Grad Grp, Philadelphia, PA 19104 USA
[4] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[5] Univ Penn, Perelman Sch Med, Penn Genome Frontiers Inst, Philadelphia, PA 19104 USA
[6] Univ Penn, Dept Biol, Philadelphia, PA 19104 USA
[7] Univ Penn, Perelman Sch Med, Inst Aging, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
INTEGRATIVE ANNOTATION; REVEALS; CLASSIFICATION; EXPRESSION; MICRORNAS; GENES;
D O I
10.1093/nar/gkt426
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with similar to 80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.
引用
收藏
页数:10
相关论文
共 29 条
[11]   DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments [J].
Fasold, Mario ;
Langenberger, David ;
Binder, Hans ;
Stadler, Peter F. ;
Hoffmann, Steve .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W112-W117
[12]   miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades [J].
Friedlaender, Marc R. ;
Mackowiak, Sebastian D. ;
Li, Na ;
Chen, Wei ;
Rajewsky, Nikolaus .
NUCLEIC ACIDS RESEARCH, 2012, 40 (01) :37-52
[13]   The UCSC Genome Browser database: update 2011 [J].
Fujita, Pauline A. ;
Rhead, Brooke ;
Zweig, Ann S. ;
Hinrichs, Angie S. ;
Karolchik, Donna ;
Cline, Melissa S. ;
Goldman, Mary ;
Barber, Galt P. ;
Clawson, Hiram ;
Coelho, Antonio ;
Diekhans, Mark ;
Dreszer, Timothy R. ;
Giardine, Belinda M. ;
Harte, Rachel A. ;
Hillman-Jackson, Jennifer ;
Hsu, Fan ;
Kirkup, Vanessa ;
Kuhn, Robert M. ;
Learned, Katrina ;
Li, Chin H. ;
Meyer, Laurence R. ;
Pohl, Andy ;
Raney, Brian J. ;
Rosenbloom, Kate R. ;
Smith, Kayla E. ;
Haussler, David ;
Kent, W. James .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D876-D882
[14]   Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals [J].
Guttman, Mitchell ;
Amit, Ido ;
Garber, Manuel ;
French, Courtney ;
Lin, Michael F. ;
Feldser, David ;
Huarte, Maite ;
Zuk, Or ;
Carey, Bryce W. ;
Cassady, John P. ;
Cabili, Moran N. ;
Jaenisch, Rudolf ;
Mikkelsen, Tarjei S. ;
Jacks, Tyler ;
Hacohen, Nir ;
Bernstein, Bradley E. ;
Kellis, Manolis ;
Regev, Aviv ;
Rinn, John L. ;
Lander, Eric S. .
NATURE, 2009, 458 (7235) :223-227
[15]   RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries [J].
Habegger, Lukas ;
Sboner, Andrea ;
Gianoulis, Tara A. ;
Rozowsky, Joel ;
Agarwal, Ashish ;
Snyder, Michael ;
Gerstein, Mark .
BIOINFORMATICS, 2011, 27 (02) :281-283
[16]   FAST FOLDING AND COMPARISON OF RNA SECONDARY STRUCTURES [J].
HOFACKER, IL ;
FONTANA, W ;
STADLER, PF ;
BONHOEFFER, LS ;
TACKER, M ;
SCHUSTER, P .
MONATSHEFTE FUR CHEMIE, 1994, 125 (02) :167-188
[17]   Deep sequencing of small RNAs from human skin reveals major alterations in the psoriasis miRNAome [J].
Joyce, Cailin E. ;
Zhou, Xiang ;
Xia, Jing ;
Ryan, Caitriona ;
Thrash, Breck ;
Menter, Alan ;
Zhang, Weixiong ;
Bowcock, Anne M. .
HUMAN MOLECULAR GENETICS, 2011, 20 (20) :4025-4040
[18]   Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression [J].
Khalil, Ahmad M. ;
Guttman, Mitchell ;
Huarte, Maite ;
Garber, Manuel ;
Raj, Arjun ;
Morales, Dianali Rivea ;
Thomas, Kelly ;
Presser, Aviva ;
Bernstein, Bradley E. ;
van Oudenaarden, Alexander ;
Regev, Aviv ;
Lander, Eric S. ;
Rinn, John L. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (28) :11667-11672
[19]   A mammalian microRNA expression atlas based on small RNA library sequencing [J].
Landgraf, Pablo ;
Rusu, Mirabela ;
Sheridan, Robert ;
Sewer, Alain ;
Iovino, Nicola ;
Aravin, Alexei ;
Pfeffer, Sebastien ;
Rice, Amanda ;
Kamphorst, Alice O. ;
Landthaler, Markus ;
Lin, Carolina ;
Socci, Nicholas D. ;
Hermida, Leandro ;
Fulci, Valerio ;
Chiaretti, Sabina ;
Foa, Robin ;
Schliwka, Julia ;
Fuchs, Uta ;
Novosel, Astrid ;
Mueller, Roman-Ulrich ;
Schermer, Bernhard ;
Bissels, Ute ;
Inman, Jason ;
Phan, Quang ;
Chien, Minchen ;
Weir, David B. ;
Choksi, Ruchi ;
De Vita, Gabriella ;
Frezzetti, Daniela ;
Trompeter, Hans-Ingo ;
Hornung, Veit ;
Teng, Grace ;
Hartmann, Gunther ;
Palkovits, Miklos ;
Di Lauro, Robert ;
Wernet, Peter ;
Macino, Giuseppe ;
Rogler, Charles E. ;
Nagle, James W. ;
Ju, Jingyue ;
Papavasiliou, F. Nina ;
Benzing, Thomas ;
Lichter, Peter ;
Tam, Wayne ;
Brownstein, Michael J. ;
Bosio, Andreas ;
Borkhardt, Arndt ;
Russo, James J. ;
Sander, Chris ;
Zavolan, Mihaela .
CELL, 2007, 129 (07) :1401-1414
[20]  
Langenberger D, 2010, BIOCOMPUT-PAC SYM, P80