Matrix factorization-based data fusion for the prediction of lncRNA-disease associations

被引:173
作者
Fu, Guangyuan [1 ]
Wang, Jun [1 ]
Domeniconi, Carlotta [2 ]
Yu, Guoxian [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] George Mason Univ, Dept Comp Sci, Farifax, VA 22030 USA
关键词
LONG NONCODING RNA; BREAST-CANCER; INFORMATION; EXPRESSION; ONTOLOGY; GENES; CELLS; WT1;
D O I
10.1093/bioinformatics/btx794
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Long non-coding RNAs (lncRNAs) play crucial roles in complex disease diagnosis, prognosis, prevention and treatment, but only a small portion of lncRNA-disease associations have been experimentally verified. Various computational models have been proposed to identify lncRNA-disease associations by integrating heterogeneous data sources. However, existing models generally ignore the intrinsic structure of data sources or treat them as equally relevant, while they may not be. Results: To accurately identify lncRNA-disease associations, we propose a Matrix Factorization based LncRNA-Disease Association prediction model (MFLDA in short). MFLDA decomposes data matrices of heterogeneous data sources into low-rank matrices via matrix tri-factorization to explore and exploit their intrinsic and shared structure. MFLDA can select and integrate the data sources by assigning different weights to them. An iterative solution is further introduced to simultaneously optimize the weights and low-rank matrices. Next, MFLDA uses the optimized low-rank matrices to reconstruct the lncRNA-disease association matrix and thus to identify potential associations. In 5-fold cross validation experiments to identify verified lncRNA-disease associations, MFLDA achieves an area under the receiver operating characteristic curve (AUC) of 0.7408, at least 3% higher than those given by state-of-the-art data fusion based computational models. An empirical study on identifying masked lncRNA-disease associations again shows that MFLDA can identify potential associations more accurately than competing models. A case study on identifying lncRNAs associated with breast, lung and stomach cancers show that 38 out of 45 (84%) associations predicted by MFLDA are supported by recent biomedical literature and further proves the capability of MFLDA in identifying novel lncRNA-disease associations. MFLDA is a general data fusion framework, and as such it can be adopted to predict associations between other biological entities.
引用
收藏
页码:1529 / 1537
页数:9
相关论文
共 48 条
[1]   WT1 expression in breast cancer disrupts the epithelial/mesenchymal balance of tumour cells and correlates with the metabolic response to docetaxel [J].
Artibani, Mara ;
Sims, Andrew H. ;
Slight, Joan ;
Aitken, Stuart ;
Thornburn, Anna ;
Muir, Morwenna ;
Brunton, Valerie G. ;
Del-Pozo, Jorge ;
Morrison, Linda R. ;
Katz, Elad ;
Hastie, Nicholas D. ;
Hohenstein, Peter .
SCIENTIFIC REPORTS, 2017, 7
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Network biology:: Understanding the cell's functional organization [J].
Barabási, AL ;
Oltvai, ZN .
NATURE REVIEWS GENETICS, 2004, 5 (02) :101-U15
[4]   Network medicine: a network-based approach to human disease [J].
Barabasi, Albert-Laszlo ;
Gulbahce, Natali ;
Loscalzo, Joseph .
NATURE REVIEWS GENETICS, 2011, 12 (01) :56-68
[5]   Identification of genes associated with chemosensitivity to SAHA/taxane combination treatment in taxane-resistant breast cancer cells [J].
Chang, Hyun ;
Jeung, Hei-Cheul ;
Jung, Je Jun ;
Kim, Tae Soo ;
Rha, Sun Young ;
Chung, Hyun Cheol .
BREAST CANCER RESEARCH AND TREATMENT, 2011, 125 (01) :55-63
[6]   LncRNADisease: a database for long-non-coding RNA-associated diseases [J].
Chen, Geng ;
Wang, Ziyun ;
Wang, Dongqing ;
Qiu, Chengxiang ;
Liu, Mingxi ;
Chen, Xing ;
Zhang, Qipeng ;
Yan, Guiying ;
Cui, Qinghua .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D983-D986
[7]   Long non-coding RNAs and complex diseases: from experimental results to computational models [J].
Chen, Xing ;
Yan, Chenggang Clarence ;
Zhang, Xu ;
You, Zhu-Hong .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) :558-576
[8]   IRWRLDA: improved random walk with restart for lncRNA-disease association prediction [J].
Chen, Xing ;
You, Zhu-Hong ;
Yan, Gui-Ying ;
Gong, Dun-Wei .
ONCOTARGET, 2016, 7 (36) :57919-57931
[9]   KATZLDA: KATZ measure for the lncRNA-disease association prediction [J].
Chen, Xing .
SCIENTIFIC REPORTS, 2015, 5
[10]   Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA [J].
Chen, Xing .
SCIENTIFIC REPORTS, 2015, 5