circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier

被引:21
作者
Li, Hui [1 ]
Deng, Zhaohong [2 ,3 ]
Yang, Haitao [1 ]
Pan, Xiaoyong [4 ]
Wei, Zhisheng [5 ,6 ]
Shen, Hong-Bin [7 ]
Choi, Kup-Sze [8 ]
Wang, Lei [5 ,6 ]
Wang, Shitong [9 ]
Wu, Jing [5 ,6 ]
机构
[1] Jiangnan Univ, Wuxi 214012, Jiangsu, Peoples R China
[2] Jiangnan Univ, Key Lab Computat Neurosci & Brain Inspired Intell, Sch Artificial Intelligence & Comp Sci, Wuxi, Jiangsu, Peoples R China
[3] Jiangnan Univ, ZJLab, Sch Artificial Intelligence & Comp Sci, Wuxi, Jiangsu, Peoples R China
[4] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
[5] Jiangnan Univ, Sch Biotechnol, Wuxi, Jiangsu, Peoples R China
[6] Jiangnan Univ, Key Lab, Ind Biotechnol Minist, Wuxi, Jiangsu, Peoples R China
[7] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[8] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[9] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
circRNA-RBP binding site prediction; deep feature learning; WGCCA; multi-view TSK fuzzy system; CANONICAL CORRELATION-ANALYSIS; CIRCULAR RNAS; FUZZY-LOGIC;
D O I
10.1093/bib/bbab394
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.
引用
收藏
页数:14
相关论文
共 47 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]   Identification of RNA-protein interaction networks using PAR-CLIP [J].
Ascano, Manuel ;
Hafner, Markus ;
Cekan, Pavol ;
Gerstberger, Stefanie ;
Tuschl, Thomas .
WILEY INTERDISCIPLINARY REVIEWS-RNA, 2012, 3 (02) :159-177
[3]  
Azeem MF, 2000, IEEE T NEURAL NETWOR, V11, P1332, DOI 10.1109/72.883438
[4]   A deep neural network approach for learning intrinsic protein-RNA binding preferences [J].
Ben-Bassat, Ilan ;
Chor, Benny ;
Orenstein, Yaron .
BIOINFORMATICS, 2018, 34 (17) :638-646
[5]  
Benton A, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, P14
[6]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[7]   pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks [J].
Budach, Stefan ;
Marsico, Annalisa .
BIOINFORMATICS, 2018, 34 (17) :3035-3037
[8]   The biogenesis and emerging roles of circular RNAs [J].
Chen, Ling-Ling .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2016, 17 (04) :205-211
[9]   Prediction of Golgi-resident protein types by using feature selection technique [J].
Ding, Hui ;
Guo, Shou-Hui ;
Deng, En-Ze ;
Yuan, Lu-Feng ;
Guo, Feng-Biao ;
Huang, Jian ;
Rao, Nini ;
Chen, Wei ;
Lin, Hao .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 124 :9-13
[10]   Multi-view semi-supervised web image classification via co-graph [J].
Du, Youtian ;
Li, Qian ;
Cai, Zhongmin ;
Guan, Xiaohong .
NEUROCOMPUTING, 2013, 122 :430-440