MFPINC: prediction of plant ncRNAs based on multi-source feature fusion

被引:1
作者
Nie, Zhenjun [1 ]
Gao, Mengqing [1 ]
Jin, Xiu [1 ,2 ]
Rao, Yuan [1 ,2 ]
Zhang, Xiaodan [1 ,2 ]
机构
[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Peoples R China
[2] Minist Agr & Rural Affairs, Key Lab Agr Sensors, Hefei 230036, Peoples R China
关键词
Plants; ncRNA prediction; Fusion of deep feature and sequence feature; FEATURE-SELECTION; NONCODING RNAS; PROTEIN; MECHANISMS;
D O I
10.1186/s12864-024-10439-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Non-coding RNAs (ncRNAs) are recognized as pivotal players in the regulation of essential physiological processes such as nutrient homeostasis, development, and stress responses in plants. Common methods for predicting ncRNAs are susceptible to significant effects of experimental conditions and computational methods, resulting in the need for significant investment of time and resources. Therefore, we constructed an ncRNA predictor(MFPINC), to predict potential ncRNA in plants which is based on the PINC tool proposed by our previous studies. Specifically, sequence features were carefully refined using variance thresholding and F-test methods, while deep features were extracted and feature fusion were performed by applying the GRU model. The comprehensive evaluation of multiple standard datasets shows that MFPINC not only achieves more comprehensive and accurate identification of gene sequences, but also significantly improves the expressive and generalization performance of the model, and MFPINC significantly outperforms the existing competing methods in ncRNA identification. In addition, it is worth mentioning that our tool can also be found on Github (https://github.com/Zhenj-Nie/MFPINC) the data and source code can also be downloaded for free.
引用
收藏
页数:23
相关论文
共 53 条
[21]   Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm [J].
Lertampaiporn, Supatcha ;
Thammarongtham, Chinae ;
Nukoolkit, Chakarida ;
Kaewkamnerdpong, Boonserm ;
Ruengjitchatchawalya, Marasri .
NUCLEIC ACIDS RESEARCH, 2014, 42 (11) :e93
[22]   PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme [J].
Li, Aimin ;
Zhang, Junying ;
Zhou, Zhongyin .
BMC BIOINFORMATICS, 2014, 15
[23]   Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences [J].
Li, Weizhong ;
Godzik, Adam .
BIOINFORMATICS, 2006, 22 (13) :1658-1659
[24]   Artificial intelligence for multimodal data integration in oncology [J].
Lipkova, Jana ;
Chen, Richard J. ;
Chen, Bowen ;
Lu, Ming Y. ;
Barbieri, Matteo ;
Shao, Daniel ;
Vaidya, Anurag J. ;
Chen, Chengkuan ;
Zhuang, Luoting ;
Williamson, Drew F. K. ;
Shaban, Muhammad ;
Chen, Tiffany Y. ;
Mahmood, Faisal .
CANCER CELL, 2022, 40 (10) :1095-1110
[25]  
Mikolov T, 2013, Arxiv, DOI [arXiv:1301.3781, DOI 10.48550/ARXIV.1301.3781]
[26]   Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding [J].
Min, Xu ;
Zeng, Wanwen ;
Chen, Ning ;
Chen, Ting ;
Jiang, Rui .
BIOINFORMATICS, 2017, 33 (14) :I92-I101
[27]   Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants [J].
Negri, Tatianne da Costa ;
Luz Alves, Wonder Alexandre ;
Bugatti, Pedro Henrique ;
Maeda Saito, Priscila Tiemi ;
Domingues, Douglas Silva ;
Paschoal, Alexandre Rossi .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) :682-689
[28]   Deep learning predicts short non-coding RNA functions from only raw sequence data [J].
Noviello, Teresa Maria Rosaria ;
Ceccarelli, Francesco ;
Ceccarelli, Michele ;
Cerulo, Luigi .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (11)
[29]   Prediction and classification of ncRNAs using structural information [J].
Panwar, Bharat ;
Arora, Amit ;
Raghava, Gajendra P. S. .
BMC GENOMICS, 2014, 15
[30]   GREENC: a Wiki-based database of plant lncRNAs [J].
Paytuvi Gallart, Andreu ;
Hermoso Pulido, Antonio ;
Martinez de Lagran, Irantzu Anzar ;
Sanseverino, Walter ;
Aiese Cigliano, Riccardo .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D1161-D1166