Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines

被引:1
|
作者
Raja, Kalpana [1 ,2 ]
Natarajan, Jeyakumar [1 ]
机构
[1] Bharathiar Univ, Sch Life Sci, Dept Bioinformat, Data Min & Text Min Lab, Coimbatore 641046, Tamil Nadu, India
[2] Univ Michigan, Sch Med, Dept Dermatol, Ann Arbor, MI USA
关键词
Human protein phosphorylation; hPP corpus; Support Vector Machines; Natural language processing; Information extraction; Post transcriptional modification; EXTRACTION; DATABASE; SYSTEM;
D O I
10.1016/j.cmpb.2018.03.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. Objective: In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. Methods: First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. Results: The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. Conclusions: The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:57 / 64
页数:8
相关论文
共 50 条
  • [31] Protein Secondary Structure Prediction Using Support Vector Machines and a Codon Encoding Scheme
    Zamani, Masood
    Kremer, Stefan C.
    2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2012,
  • [32] A NEW INTELLIGENT FAULT CLASSIFICATION METHOD USING TIME SERIES DATA MINING AND SUPPORT VECTOR MACHINES
    Aydin, Ilhan
    Karakose, Mehmet
    Akin, Erhan
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2008, 23 (02): : 431 - 440
  • [33] Mapping overused slopelands from SPOT data using support vector machines and artificial neural networks
    Chiang, Yeh-Hsiu
    Lin, Li-Ling
    JOURNAL OF APPLIED REMOTE SENSING, 2013, 7
  • [34] RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information
    Torii, Manabu
    Arighi, Cecilia N.
    Li, Gang
    Wang, Qinghua
    Wu, Cathy H.
    Vijay-Shanker, K.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (01) : 17 - 29
  • [35] PROFESS, a system to support extracting protein function information from literature
    Kaneta, Y
    Numa, M
    Munna, MA
    Sakurai, Y
    Ohkawa, T
    Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1265 - 1268
  • [36] Using Support Vector Machines for Prediction of Protein Structural Classes Based on Discrete Wavelet Transform
    Qiu, Jian-Ding
    Luo, San-Hua
    Huang, Jian-Hua
    Liang, Ru-Ping
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2009, 30 (08) : 1344 - 1350
  • [37] Using support vector machines to classify building elements for checking the semantic integrity of building information models
    Koo, Bonsang
    La, Sunmin
    Cho, Nam-Wook
    Yu, Youngsu
    AUTOMATION IN CONSTRUCTION, 2019, 98 : 183 - 194
  • [38] Using support vector machines to identify literacy skills: Evidence from eye movements
    Ya Lou
    Yanping Liu
    Johanna K. Kaakinen
    Xingshan Li
    Behavior Research Methods, 2017, 49 : 887 - 895
  • [39] Using support vector machines to identify literacy skills: Evidence from eye movements
    Lou, Ya
    Liu, Yanping
    Kaakinen, Johanna K.
    Li, Xingshan
    BEHAVIOR RESEARCH METHODS, 2017, 49 (03) : 887 - 895
  • [40] Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines
    Majid, Abdul
    Ali, Safdar
    Iqbal, Mubashar
    Kausar, Nabeela
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 113 (03) : 792 - 808