Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy

被引:152
作者
Liu, Bin [1 ,2 ]
Fang, Longyun [1 ]
Wang, Shanyi [1 ]
Wang, Xiaolong [1 ,2 ]
Li, Hongtao [4 ]
Chou, Kuo-Chen [3 ,5 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[2] Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
[3] Gordon Life Sci Inst, Boston, MA USA
[4] Stn Ocean Adm Wendeng, Wendeng Marine Environm Monitoring Stn, Weihai, Shandong, Peoples R China
[5] King Abdulaziz Univ, CEGMR, Jeddah 21589, Saudi Arabia
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
MicroRNA precursor; True pre-miRNA; False pre-miRNA; Degenerate Kmer; deKmer web-server; Long-range effect; AMINO-ACID-COMPOSITION; SEQUENCE-BASED PREDICTOR; PSEUDO TRINUCLEOTIDE COMPOSITION; FUNCTIONAL DOMAIN COMPOSITION; PROTEIN SUBCELLULAR LOCATION; LABEL LEARNING CLASSIFIER; SUPPORT VECTOR MACHINES; NUCLEOTIDE COMPOSITION; WEB SERVER; PHYSICOCHEMICAL PROPERTIES;
D O I
10.1016/j.jtbi.2015.08.025
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The microRNA (miRNA), a small non-coding RNA molecule, plays an important role in transcriptional and post-transcriptional regulation of gene expression. Its abnormal expression, however, has been observed in many cancers and other disease states, implying that the miRNA molecules are also deeply involved in these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Most existing methods in this regard were based on the strategy in which RNA samples were formulated by a vector formed by their Kmer components. But the length of Kmers must be very short; otherwise, the vector's dimension would be extremely large, leading to the "high-dimension disaster" or overfitting problem. Inspired by the concept of "degenerate energy levels" in quantum mechanics, we introduced the "degenerate Kmer" (deKmer) to represent RNA samples. By doing so, not only we can accommodate long-range coupling effects but also we can avoid the high-dimension problem. Rigorous jackknife tests and cross-species experiments indicated that our approach is very promising. It has not escaped our notice that the deKmer approach can also be applied to many other areas of computational biology. A user-friendly web-server for the new predictor has been established at http://bioinformatics.hitsz.edu.cn/miRNA-deKmer/, by which users can easily get their desired results. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:153 / 159
页数:7
相关论文
共 70 条
[1]   KINETIC-STUDIES WITH THE NONNUCLEOSIDE HIV-1 REVERSE-TRANSCRIPTASE INHIBITOR-U-88204E [J].
ALTHAUS, IW ;
CHOU, JJ ;
GONZALES, AJ ;
DEIBEL, MR ;
CHOU, KC ;
KEZDY, FJ ;
ROMERO, DL ;
PALMER, JR ;
THOMAS, RC ;
ARISTOFF, PA ;
TARPLEY, WG ;
REUSSER, F .
BIOCHEMISTRY, 1993, 32 (26) :6548-6554
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], MOL GENET GENOM
[4]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[5]  
Chang C.C., 2009, LIBSVM LIB SUPPORT V
[6]   Prediction of linear B-cell epitopes using amino acid pair antigenicity scale [J].
Chen, J. ;
Liu, H. ;
Yang, J. ;
Chou, K.-C. .
AMINO ACIDS, 2007, 33 (03) :423-428
[7]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634
[8]   PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions [J].
Chen, Wei ;
Zhang, Xitong ;
Brooker, Jordan ;
Lin, Hao ;
Zhang, Liqing ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2015, 31 (01) :119-+
[9]   iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Deng, En-Ze ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2014, 462 :76-83
[10]   iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
BIOMED RESEARCH INTERNATIONAL, 2014, 2014