Literature mining of protein phosphorylation using dependency parse trees

被引:5
作者
Wang, Mang [1 ]
Xia, Hong [1 ]
Sun, Dongdong [1 ]
Chen, Zhaoxiong [2 ]
Wang, Minghui [1 ,3 ]
Li, Ao [1 ,3 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China
[2] Univ Sci & Technol China, Sch Life Sci, Hefei 230026, Peoples R China
[3] Univ Sci & Technol China, Ctr Biomed Engn, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
Phosphorylation; Dependency parse tree; Text mining; Systems biology; SYSTEM;
D O I
10.1016/j.ymeth.2014.01.008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As one of the most common post-translational modifications (PTMs), protein phosphorylation plays an important role in various biological processes, such as signaling transduction, cellular metabolism, differentiation, growth, regulation and apoptosis. Protein phosphorylation is of great value not only in illustrating the underlying molecular mechanisms but also in treatment of diseases and design of new drugs. Recently, there is an increasing interest in automatically extracting phosphorylation information from biomedical literatures. However, it still remains a challenging task due to the tremendous volume of literature and circuitous modes of expression for protein phosphorylation. To address this issue, we propose a novel text-mining method for efficiently retrieving and extracting protein phosphorylation information from literature. By employing natural language processing (NLP) technologies, this method transforms each sentence into dependency parse trees that can precisely reflect the intrinsic relationship of phosphorylation-related key words, from which detailed information of substrates, kinases and phosphorylation sites is extracted based on syntactic patterns. Compared with other existing approaches, the proposed method demonstrates significantly improved performance, suggesting it is a powerful bio-informatics approach to retrieving phosphorylation information from a large amount of literature. A web server for the proposed method is freely available at http://bioinformatics.ustc.edu.cn/pptm/. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:386 / 393
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2011, Proceedings of the BioNLP Shared Task 2011 Workshop
[2]  
[Anonymous], 1993, P INF RETR 93
[3]  
Carroll J, 1999, CS9907013
[4]   THE ROLE OF PROTEIN-PHOSPHORYLATION IN NEURAL AND HORMONAL-CONTROL OF CELLULAR-ACTIVITY [J].
COHEN, P .
NATURE, 1982, 296 (5858) :613-620
[5]  
de Marneffe Marie.-Catherine., 2006, P LREC, V6, P449
[6]   Phospho.ELM: a database of phosphorylation sites-update 2011 [J].
Dinkel, Holger ;
Chica, Claudia ;
Via, Allegra ;
Gould, Cathryn M. ;
Jensen, Lars J. ;
Gibson, Toby J. ;
Diella, Francesca .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D261-D267
[7]   LINNAEUS: A species name identification system for biomedical literature [J].
Gerner, Martin ;
Nenadic, Goran ;
Bergman, Casey M. .
BMC BIOINFORMATICS, 2010, 11
[8]   Literature mining and database annotation of protein phosphorylation using a rule-based system [J].
Hu, ZZ ;
Narayanaswamy, M ;
Ravikumar, KE ;
Vijay-Shanker, K ;
Wu, CH .
BIOINFORMATICS, 2005, 21 (11) :2759-2765
[9]   IProLINK: an integrated protein resource for literature mining [J].
Hu, ZZ ;
Mani, I ;
Hermoso, V ;
Liu, HF ;
Wu, CH .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (5-6) :409-416
[10]   Literature mining for the biologist: from information retrieval to biological discovery [J].
Jensen, LJ ;
Saric, J ;
Bork, P .
NATURE REVIEWS GENETICS, 2006, 7 (02) :119-129