Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines

被引:1
|
作者
Raja, Kalpana [1 ,2 ]
Natarajan, Jeyakumar [1 ]
机构
[1] Bharathiar Univ, Sch Life Sci, Dept Bioinformat, Data Min & Text Min Lab, Coimbatore 641046, Tamil Nadu, India
[2] Univ Michigan, Sch Med, Dept Dermatol, Ann Arbor, MI USA
关键词
Human protein phosphorylation; hPP corpus; Support Vector Machines; Natural language processing; Information extraction; Post transcriptional modification; EXTRACTION; DATABASE; SYSTEM;
D O I
10.1016/j.cmpb.2018.03.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. Objective: In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. Methods: First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. Results: The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. Conclusions: The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:57 / 64
页数:8
相关论文
共 50 条
  • [41] Prediction of Protein Secondary Structure Based on NMR Chemical Shift Data Using Support Vector Machines
    Sabouri, Ahmad
    Ardalan, Adel
    Shahidi-Nejad, Reza
    2010 12TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2010, : 201 - 205
  • [42] Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures
    Kim, Sun
    Kim, Jeongmi
    Zhang, Byoung-Tak
    COMPUTERS IN BIOLOGY AND MEDICINE, 2009, 39 (02) : 187 - 193
  • [43] Estimation of the hydrodynamic coefficients from captive model test results by using support vector machines
    Zhang, Xin-Guang
    Zou, Zao-Jian
    OCEAN ENGINEERING, 2013, 73 : 25 - 31
  • [44] LAND SURFACE TEMPERATURE ESTIMATION FROM PASSIVE SATELLITE IMAGES USING SUPPORT VECTOR MACHINES
    Serpico, Sebastiano B.
    De Martino, Michaela
    Moser, Gabriele
    Zortea, Maciel
    2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 2109 - 2112
  • [45] Real time facial expression recognition from image sequences using Support Vector Machines
    Kotsia, I
    Pitas, I
    Visual Communications and Image Processing 2005, Pts 1-4, 2005, 5960 : 814 - 821
  • [46] Predictive modeling in glioma grading from MR perfusion images using support vector machines
    Emblem, Kyrre E.
    Zoellner, Frank G.
    Tennoe, Bjorn
    Nedregaard, Baard
    Nome, Terje
    Due-Tonnessen, Paulina
    Hald, John K.
    Scheie, David
    Bjornerud, Atle
    MAGNETIC RESONANCE IN MEDICINE, 2008, 60 (04) : 945 - 952
  • [47] Classifying segmented hyperspectral data from a heterogeneous urban environment using support vector machines
    van der Linden, Sebastian
    Janz, Andreas
    Waske, Bjoern
    Eiden, Michael
    Hostert, Patrick
    JOURNAL OF APPLIED REMOTE SENSING, 2007, 1
  • [48] Extraction of Activities Information from Construction Contracts Using Natural Language Processing (NLP) Methods to Support Scheduling
    ul Hassan, Fahad
    Tuyen Le
    CONSTRUCTION RESEARCH CONGRESS 2022: COMPUTER APPLICATIONS, AUTOMATION, AND DATA ANALYTICS, 2022, : 773 - 781
  • [49] Prediction of Function Changes Associated With Single-Point Protein Mutations Using Support Vector Machines (SVMs)
    Gao, Shan
    Zhang, Ning
    Duan, Guang You
    Yang, Zhuo
    Ruan, Ji Shou
    Zhang, Tao
    HUMAN MUTATION, 2009, 30 (08) : 1161 - 1166
  • [50] Soft information fusion of correlation filter output planes using Support Vector Machines for improved fingerprint verification performance
    Venkataramani, K
    Keskinoz, M
    Kumar, BVKV
    BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION II, 2005, 5779 : 184 - 195