Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach

被引:5
|
作者
Qu, Jinchan [1 ]
Steppi, Albert [2 ]
Zhong, Dongrui [1 ]
Hao, Jie [1 ]
Wang, Jian [3 ]
Lung, Pei-Yau [4 ]
Zhao, Tingting [5 ]
He, Zhe [6 ]
Zhang, Jinfeng [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
[2] Harvard Med Sch, Lab Syst Pharmacol, Boston, MA 02115 USA
[3] CloudMedx, Palo Alto, CA 94301 USA
[4] Verisk Insurance Solut, Middletown, CT 06457 USA
[5] Florida State Univ, Dept Geog, Tallahassee, FL 32306 USA
[6] Florida State Univ, Coll Commun & Informat, Tallahassee, FL 32306 USA
关键词
Protein-protein interactions; Mutations; Text mining; Biomedical literature retrieval; Protein interactions affected by mutations; INTERACTION EXTRACTION; EXPRESSION; DRUG; TOOL;
D O I
10.1186/s12864-020-07185-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundInformation on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation.ResultsOur system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score.ConclusionsThe performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Using a Fragment-Based Approach To Target Protein-Protein Interactions
    Scott, Duncan E.
    Ehebauer, Matthias T.
    Pukala, Tara
    Marsh, May
    Blundell, Tom L.
    Venkitaraman, Ashok R.
    Abell, Chris
    Hyvoenen, Marko
    CHEMBIOCHEM, 2013, 14 (03) : 332 - 342
  • [42] HPiP: an R/Bioconductor package for predicting host-pathogen protein-protein interactions from protein sequences using ensemble machine learning approach
    Rahmatbakhsh, Matineh
    Moutaoufik, Mohamed Taha
    Gagarinova, Alla
    Babu, Mohan
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [43] Biophysical prediction of protein-peptide interactions and signaling networks using machine learning
    Cunningham, Joseph M.
    Koytiger, Grigoriy
    Sorger, Peter K.
    AlQuraishi, Mohammed
    NATURE METHODS, 2020, 17 (02) : 175 - +
  • [44] Recent Advances in Predicting Protein-lncRNA Interactions Using Machine Learning Methods
    Yu, Han
    Shen, Zi-Ang
    Zhou, Yuan-Ke
    Du, Pu-Feng
    CURRENT GENE THERAPY, 2022, 22 (03) : 228 - 244
  • [45] Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions
    Hong, Xiaokun
    Lv, Jiyang
    Li, Zhengxin
    Xiong, Yi
    Zhang, Jian
    Chen, Hai-Feng
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2023, 243
  • [46] Spectra-descriptor-based machine learning for predicting protein-ligand interactions
    Chen, Cheng
    Wang, Ledu
    Feng, Yi
    Yao, Wencheng
    Liu, Jiahe
    Jiang, Zifan
    Zhao, Luyuan
    Zhang, Letian
    Jiang, Jun
    Feng, Shuo
    CHEMICAL SCIENCE, 2025, 16 (15) : 6355 - 6365
  • [47] Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools
    Jia, Lei
    Yarlagadda, Ramya
    Reed, Charles C.
    PLOS ONE, 2015, 10 (09):
  • [48] How to approach machine learning-based prediction of drug/compound–target interactions
    Heval Atas Guvenilir
    Tunca Doğan
    Journal of Cheminformatics, 15
  • [49] Analysing protein dynamics using machine learning based generative models
    Albu, Alexandra-Ioana
    Czibula, Gabriela
    2020 IEEE 14TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2020), 2020, : 135 - 140
  • [50] Detection of flood-affected areas using multitemporal remote sensing data: a machine learning approach
    Kurniawan, Robert
    Sujono, Imam
    Caesarendra, Wahyu
    Nasution, Bahrul Ilmi
    Gio, Prana Ugiana
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)