Computational prediction of disease related lncRNAs using machine learning

被引:8
作者
Khalid, Razia [1 ]
Naveed, Hammad [1 ]
Khalid, Zoya [2 ]
机构
[1] Natl Univ Comp & Emerging Sci, Dept Comp Sci, Computat Biol Res Lab, NUCES FAST, Islamabad, Pakistan
[2] Quaid I Azam Univ, Natl Ctr Bioinformat NCB, Islamabad, Pakistan
关键词
LONG NONCODING RNAS; DATABASE; PROTEIN; HOTAIR; GENOME;
D O I
10.1038/s41598-023-27680-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Long non-coding RNAs (lncRNAs), which were once considered as transcriptional noise, are now in the limelight of current research. LncRNAs play a major role in regulating various biological processes such as imprinting, cell differentiation, and splicing. The mutations of lncRNAs are involved in various complex diseases. Identifying lncRNA-disease associations has gained a lot of attention as predicting it efficiently will lead towards better disease treatment. In this study, we have developed a machine learning model that predicts disease-related lncRNAs by combining sequence and structure-based features. The features were trained on SVM and Random Forest classifiers. We have compared our method with the state-of-the-art and obtained the highest F1 score of 76% on SVM classifier. Moreover, this study has overcome two serious limitations of the reported method which are lack of redundancy checking and implementation of oversampling for balancing the positive and negative class. Our method has achieved improved performance among machine learning models reported for lncRNA-disease associations. Combining multiple features together specifically lncRNAs sequence mutation has a significant contribution to the disease related lncRNA prediction.
引用
收藏
页数:7
相关论文
共 30 条
[1]   RNA motif discovery: a computational overview [J].
Achar, Avinash ;
Saetrom, Pal .
BIOLOGY DIRECT, 2015, 10
[2]   LncRNA HOTAIR: A master regulator of chromatin dynamics and cancer [J].
Bhan, Arunoday ;
Mandal, Subhrangsu S. .
BIOCHIMICA ET BIOPHYSICA ACTA-REVIEWS ON CANCER, 2015, 1856 (01) :151-164
[3]   LncRNADisease: a database for long-non-coding RNA-associated diseases [J].
Chen, Geng ;
Wang, Ziyun ;
Wang, Dongqing ;
Qiu, Chengxiang ;
Liu, Mingxi ;
Chen, Xing ;
Zhang, Qipeng ;
Yan, Guiying ;
Cui, Qinghua .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D983-D986
[4]   Analysis of cancer-related IncRNAs using gene ontology and KEGG pathways [J].
Chen, Lei ;
Zhang, Yu-Hang ;
Lu, Guohui ;
Huang, Tao ;
Cai, Yu-Dong .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2017, 76 :27-36
[5]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[6]   Long non-coding RNAs and complex diseases: from experimental results to computational models [J].
Chen, Xing ;
Yan, Chenggang Clarence ;
Zhang, Xu ;
You, Zhu-Hong .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) :558-576
[7]   Computational models for lncRNA function prediction and functional similarity calculation [J].
Chen, Xing ;
Sun, Ya-Zhou ;
Guan, Na-Na ;
Qu, Jia ;
Huang, Zhi-An ;
Zhu, Ze-Xuan ;
Li, Jian-Qiang .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2019, 18 (01) :58-82
[8]   MNDR v2.0: an updated resource of ncRNA-disease associations in mammals [J].
Cui, Tianyu ;
Zhang, Lin ;
Huang, Yan ;
Yi, Ying ;
Tan, Puwen ;
Zhao, Yue ;
Hu, Yongfei ;
Xu, Liyan ;
Li, Enmin ;
Wang, Dong .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D371-D374
[9]   LincSNP 3.0: an updated database for linking functional variants to human long non-coding RNAs, circular RNAs and their regulatory elements [J].
Gao, Yue ;
Li, Xin ;
Shang, Shipeng ;
Guo, Shuang ;
Wang, Peng ;
Sun, Dailin ;
Gan, Jing ;
Sun, Jie ;
Zhang, Yakun ;
Wang, Junwei ;
Wang, Xinyue ;
Li, Xia ;
Zhang, Yunpeng ;
Ning, Shangwei .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D1244-D1250
[10]   HOTAIR: an oncogenic long non-coding RNA in different cancers [J].
Hajjari, Mohammadreza ;
Salavaty, Abbas .
CANCER BIOLOGY & MEDICINE, 2015, 12 (01) :1-9