Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach

被引:5
|
作者
Qu, Jinchan [1 ]
Steppi, Albert [2 ]
Zhong, Dongrui [1 ]
Hao, Jie [1 ]
Wang, Jian [3 ]
Lung, Pei-Yau [4 ]
Zhao, Tingting [5 ]
He, Zhe [6 ]
Zhang, Jinfeng [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
[2] Harvard Med Sch, Lab Syst Pharmacol, Boston, MA 02115 USA
[3] CloudMedx, Palo Alto, CA 94301 USA
[4] Verisk Insurance Solut, Middletown, CT 06457 USA
[5] Florida State Univ, Dept Geog, Tallahassee, FL 32306 USA
[6] Florida State Univ, Coll Commun & Informat, Tallahassee, FL 32306 USA
关键词
Protein-protein interactions; Mutations; Text mining; Biomedical literature retrieval; Protein interactions affected by mutations; INTERACTION EXTRACTION; EXPRESSION; DRUG; TOOL;
D O I
10.1186/s12864-020-07185-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundInformation on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation.ResultsOur system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score.ConclusionsThe performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Biophysical prediction of protein–peptide interactions and signaling networks using machine learning
    Joseph M. Cunningham
    Grigoriy Koytiger
    Peter K. Sorger
    Mohammed AlQuraishi
    Nature Methods, 2020, 17 : 175 - 183
  • [32] Table of Contents Recognition in OCR Documents using Image-based Machine Learning
    Kosaraju, Sai
    Tsaku, Nelson Zange
    Patel, Pritesh
    Bayramoglu, Tanju
    Modgil, Girish
    Kang, Mingon
    PROCEEDINGS OF THE 2019 ANNUAL ACM SOUTHEAST CONFERENCE (ACMSE 2019), 2019, : 186 - 189
  • [33] Predicting protein-protein interactions in E-coli using machine learning methods
    Goyal, Kshama
    Vidyasagar, M.
    PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 2190 - 2195
  • [34] A novel approach for human activity recognition using object interactions and machine learning
    Schroth, Marc
    Etkin, Timucin
    Stork, Wilhelm
    2021 IEEE SENSORS APPLICATIONS SYMPOSIUM (SAS 2021), 2021,
  • [35] Recent Advances in Machine Learning Based Prediction of RNA-Protein Interactions
    Sagar, Amit
    Xue, Bin
    PROTEIN AND PEPTIDE LETTERS, 2019, 26 (08): : 601 - 619
  • [36] Machine Learning based Sentiment Analysis using Graph Based Approach
    Bordoloi, Monali
    Biswas, Saroj Kumar
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [37] Identification of protein functions using a machine-learning approach based on sequence-derived properties
    Lee, Bum Ju
    Shin, Moon Sun
    Oh, Young Joon
    Oh, Hae Seok
    Ryu, Keun Ho
    PROTEOME SCIENCE, 2009, 7
  • [38] Identification of protein functions using a machine-learning approach based on sequence-derived properties
    Bum Ju Lee
    Moon Sun Shin
    Young Joon Oh
    Hae Seok Oh
    Keun Ho Ryu
    Proteome Science, 7
  • [39] Content Based Image Retrieval Using Machine Learning Approach
    Pavani, Palepu
    Prabha, T. Sashi
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2013, 2014, 247 : 173 - 179
  • [40] Malicious PDF Documents Detection using Machine Learning Techniques A Practical Approach with Cloud Computing Applications
    Torres, Jose
    De Los Santos, Sergio
    ICISSP: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2018, : 337 - 344