A text feature-based approach for literature mining of lncRNA-protein interactions

被引:15
作者
Li, Ao [1 ,2 ]
Zang, Qiguang [1 ]
Sun, Dongdong [1 ]
Wang, Minghui [1 ,2 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, 443 Huangshan Rd, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Ctr Biomed Engn, 443 Huangshan Rd, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
LncRNA-protein interaction; Text mining; Text features; Machine learning; NONCODING RNAS; DATABASE;
D O I
10.1016/j.neucom.2015.11.110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long non-coding RNAs (lncRNAs) play important roles in regulating transcriptional and post transcriptional levels. Currently, Knowledge of lncRNA and protein interactions (LPIs) is crucial for biomedical researches that are related to lncRNA. Many freshly discovered LPIs are stored in biomedical literature. With over one million new biomedical journal articles published every year, just keeping up with the novel finding requires automatically extracting information by text mining. To address this issue, we apply a text feature-based text mining approach to efficiently extract LPIs from biomedical literatures. Our approach consists of four steps. By employ natural language processing (NLP) technologies, this approach extracts text features from sentences that can precisely reflect the real LPIs. Our approach involves four steps including data collection, text pre-processing, structured representation, features extraction and training model and classification. The F-score performance of our approach achieves 79.5%, and the results indicate that the proposed approach can efficiently extract LPIs from biomedical literature. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:73 / 80
页数:8
相关论文
共 19 条
[1]   lncRNAdb: a reference database for long noncoding RNAs [J].
Amaral, Paulo P. ;
Clark, Michael B. ;
Gascoigne, Dennis K. ;
Dinger, Marcel E. ;
Mattick, John S. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D146-D151
[2]  
[Anonymous], NUCL ACIDS RES
[3]   lncRNome: a comprehensive knowledgebase of human long noncoding RNAs [J].
Bhartiya, Deeksha ;
Pal, Koustav ;
Ghosh, Sourav ;
Kapoor, Shruti ;
Jalali, Saakshi ;
Panwar, Bharat ;
Jain, Sakshi ;
Sati, Satish ;
Sengupta, Shantanu ;
Sachidanandan, Chetana ;
Raghava, Gajendra Pal Singh ;
Sivasubbu, Sridhar ;
Scaria, Vinod .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2013,
[4]  
Bui Q.-C., 2014, BIOINFORMATICS
[5]   A robust approach to extract biomedical events from literature [J].
Bui, Quoc-Chinh ;
Sloot, Peter M. A. .
BIOINFORMATICS, 2012, 28 (20) :2654-2661
[6]   LncRNADisease: a database for long-non-coding RNA-associated diseases [J].
Chen, Geng ;
Wang, Ziyun ;
Wang, Dongqing ;
Qiu, Chengxiang ;
Liu, Mingxi ;
Chen, Xing ;
Zhang, Qipeng ;
Yan, Guiying ;
Cui, Qinghua .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D983-D986
[7]   Prediction of protein-protein interactions using random decision forest framework [J].
Chen, XW ;
Liu, M .
BIOINFORMATICS, 2005, 21 (24) :4394-4400
[8]   PPI Finder: A Mining Tool for Human Protein-Protein Interactions [J].
He, Min ;
Wang, Yi ;
Li, Wei .
PLOS ONE, 2009, 4 (02)
[9]   Discovering patterns to extract protein-protein interactions from full texts [J].
Huang, ML ;
Zhu, XY ;
Hao, Y ;
Payan, DG ;
Qu, KB ;
Li, M .
BIOINFORMATICS, 2004, 20 (18) :3604-3612
[10]  
Li J-H, 2013, NUCLEIC ACIDS RES