Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach

被引:5
|
作者
Qu, Jinchan [1 ]
Steppi, Albert [2 ]
Zhong, Dongrui [1 ]
Hao, Jie [1 ]
Wang, Jian [3 ]
Lung, Pei-Yau [4 ]
Zhao, Tingting [5 ]
He, Zhe [6 ]
Zhang, Jinfeng [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
[2] Harvard Med Sch, Lab Syst Pharmacol, Boston, MA 02115 USA
[3] CloudMedx, Palo Alto, CA 94301 USA
[4] Verisk Insurance Solut, Middletown, CT 06457 USA
[5] Florida State Univ, Dept Geog, Tallahassee, FL 32306 USA
[6] Florida State Univ, Coll Commun & Informat, Tallahassee, FL 32306 USA
关键词
Protein-protein interactions; Mutations; Text mining; Biomedical literature retrieval; Protein interactions affected by mutations; INTERACTION EXTRACTION; EXPRESSION; DRUG; TOOL;
D O I
10.1186/s12864-020-07185-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundInformation on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation.ResultsOur system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score.ConclusionsThe performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Machine learning based approach to analyze file meta data for smart phone file triage
    Serhal, Cezar
    Le-Khac, Nhien-An
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2021, 37
  • [22] PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions
    Sun, Tingting
    Chen, Yuting
    Wen, Yuhao
    Zhu, Zefeng
    Li, Minghui
    COMMUNICATIONS BIOLOGY, 2021, 4 (01)
  • [23] PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions
    Tingting Sun
    Yuting Chen
    Yuhao Wen
    Zefeng Zhu
    Minghui Li
    Communications Biology, 4
  • [24] Significance of Sequence Features in Classification of Protein-Protein Interactions Using Machine Learning
    Raj, Sini S.
    Chandra, S. S. Vinod
    PROTEIN JOURNAL, 2024, 43 (01): : 72 - 83
  • [25] Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions
    Raza, Khalid
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY, 2017, 750 : 305 - 317
  • [27] Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach
    Sharma, Sunny
    Singh, Gurvinder
    Singh, Rajinder
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY (ICICCT 2019), 2019, 1025 : 279 - 293
  • [28] DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning
    Zhou, Yunzhuo
    Myung, Yoochan
    Rodrigues, Carlos H. M.
    Ascher, David B.
    NUCLEIC ACIDS RESEARCH, 2024, 52 (W1) : W207 - W214
  • [29] Naming Scheme Using NLP Machine Learning Method for Network Weather Monitoring System Based on ICN
    Mochida, Toru
    Nozaki, Daichi
    Okamoto, Koki
    Qi, Xin
    Wen, Zheng
    Sato, Takuro
    Yu, Keping
    2017 20TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERSONAL MULTIMEDIA COMMUNICATIONS (WPMC), 2017, : 428 - 434
  • [30] An approach to self-triage of routine skin conditions using machine learning and curated medical knowledge
    Papier, A.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2019, 139 (05) : S107 - S107