ASLncR: a novel computational tool for prediction of abiotic stress-responsive long non-coding RNAs in plants

被引:0
作者
Upendra Kumar Pradhan
Prabina Kumar Meher
Sanchita Naha
Atmakuri Ramakrishna Rao
Ajit Gupta
机构
[1] Division of Statistical Genetics,
[2] ICAR-Indian Agricultural Statistics Research Institute,undefined
[3] PUSA,undefined
[4] Division of Computer Applications,undefined
[5] ICAR-Indian Agricultural Statistics Research Institute,undefined
[6] PUSA,undefined
[7] Indian Council of Agricultural Research (ICAR),undefined
来源
Functional & Integrative Genomics | 2023年 / 23卷
关键词
Machine learning; Abiotic stress; Long non-coding RNA; Computational biology;
D O I
暂无
中图分类号
学科分类号
摘要
Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/. The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.
引用
收藏
相关论文
共 238 条
[1]  
Abbas M(2020)Machine learning based refined differential gene expression analysis of pediatric sepsis BMC Med Genet 13 122-35
[2]  
El-Manzalawy Y(2013)adabag: an R package for classification with boosting and bagging J Stat Softw 54 1-330
[3]  
Alfaro E(2020)Genome-wide investigation of regulatory roles of lncRNAs in response to heat and drought stress in Brassica juncea (Indian mustard) Environ Exp Bot 171 103922-140
[4]  
Gamez M(2021)Regulatory non-coding RNAs: a new frontier in regulation of plant biology Funct Integr Genom 21 313-32
[5]  
Garcia N(1996)Bagging predictors Mach Learn 24 123-531
[6]  
Bhatia G(2001)Random forests Mach Learn 45 5-250
[7]  
Singh A(2021)The lincRNA XH123 is involved in cotton cold-stress regulation Plant Mol Biol 106 521-874
[8]  
Verma D(2021)LncRNA TCONS_00021861 is functionally associated with drought tolerance in rice (Oryza sativa L.) via competing endogenous RNA regulation BMC Plant Biol 21 410-780
[9]  
Bhogireddy S(2018)Genome-wide analysis of long non-coding RNAs affecting roots development at an early stage in the rice response to cadmium stress BMC Genomics 19 460-1232
[10]  
Mangrauthia SK(2022)LncRNA PMAT–PtoMYB46 module represses PtoMATE and PtoARF2 promoting Pb2+ uptake and plant growth in poplar J Hazard Mater 433 128769-1986