miR-BAG: Bagging Based Identification of MicroRNA Precursors

被引:15
作者
Jha, Ashwani [1 ]
Chauhan, Rohit [1 ]
Mehra, Mrigaya [1 ]
Singh, Heikham Russiachand [1 ]
Shankar, Ravi [1 ]
机构
[1] CSIR IHBT, Studio Computat Biol & Bioinformat Biotechnol Div, Palampur, Himachal Prades, India
关键词
DEEP SEQUENCING DATA; HUMAN GENOME; COMPUTATIONAL IDENTIFICATION; GENE-EXPRESSION; PREDICTION; CLASSIFICATION; TOOL; SOFTWARE; PROTEINS; HUNDREDS;
D O I
10.1371/journal.pone.0045782
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Non-coding elements such as miRNAs play key regulatory roles in living systems. These ultra-short, similar to 21 bp long, RNA molecules are derived from their hairpin precursors and usually participate in negative gene regulation by binding the target mRNAs. Discovering miRNA candidate regions across the genome has been a challenging problem. Most of the existing tools work reliably only for limited datasets. Here, we have presented a novel reliable approach, miR-BAG, developed to identify miRNA candidate regions in genomes by scanning sequences as well as by using next generation sequencing (NGS) data. miR-BAG utilizes a bootstrap aggregation based machine learning approach, successfully creating an ensemble of complementary learners to attain high accuracy while balancing sensitivity and specificity. miR-BAG was developed for wide range of species and tested extensively for performance over a wide range of experimentally validated data. Consideration of position-specific variation of triplet structural profiles and mature miRNA anchored structural profiles had a positive impact on performance. miR-BAG's performance was found consistent and the accuracy level was observed to be >90% for most of the species considered in the present study. In a detailed comparative analysis, miR-BAG performed better than six existing tools. Using miR-BAG NGS module, we identified a total of 22 novel miRNA candidate regions in cow genome in addition to a total of 42 cow specific miRNA regions. In practice, discovery of miRNA regions in a genome demands high-throughput data analysis, requiring large amount of processing. Considering this, miR-BAG has been developed in multi-threaded parallel architecture as a web server as well as a user friendly GUI standalone version.
引用
收藏
页数:15
相关论文
共 49 条
[1]  
Abeel T, 2009, J MACH LEARN RES, V10, P931
[2]   Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) [J].
Agarwal, Sumeet ;
Vaz, Candida ;
Bhattacharya, Alok ;
Srinivasan, Ashwin .
BMC BIOINFORMATICS, 2010, 11
[3]   microPred: effective classification of pre-miRNAs for human miRNA gene prediction [J].
Batuwita, Rukshan ;
Palade, Vasile .
BIOINFORMATICS, 2009, 25 (08) :989-995
[4]   Identification of hundreds of conserved and nonconserved human microRNAs [J].
Bentwich, I ;
Avniel, A ;
Karov, Y ;
Aharonov, R ;
Gilad, S ;
Barad, O ;
Barzilai, A ;
Einat, P ;
Einav, U ;
Meiri, E ;
Sharon, E ;
Spector, Y ;
Bentwich, Z .
NATURE GENETICS, 2005, 37 (07) :766-770
[5]   Phylogenetic shadowing and computational identification of human microRNA genes [J].
Berezikov, E ;
Guryev, V ;
van de Belt, J ;
Wienholds, E ;
Plasterk, RHA ;
Cuppen, E .
CELL, 2005, 120 (01) :21-24
[6]   Diversity of microRNAs in human and chimpanzee brain [J].
Berezikov, Eugene ;
Thuemmler, Fritz ;
van Laake, Linda W. ;
Kondova, Ivanela ;
Bontrop, Ronald ;
Cuppen, Edwin ;
Plasterk, Ronald H. A. .
NATURE GENETICS, 2006, 38 (12) :1375-1377
[7]   RNA polymerase III transcribes human microRNAs [J].
Borchert, Glen M. ;
Lanier, William ;
Davidson, Beverly L. .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2006, 13 (12) :1097-1101
[8]   Exportin-5, a novel karyopherin, mediates nuclear export of double-stranded RNA binding proteins [J].
Brownawell, AM ;
Macara, IG .
JOURNAL OF CELL BIOLOGY, 2002, 156 (01) :53-64
[9]   Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome [J].
Cooper, SJ ;
Trinklein, ND ;
Anton, ED ;
Nguyen, L ;
Myers, RM .
GENOME RESEARCH, 2006, 16 (01) :1-10
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210