XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties

被引:59
作者
Deng, Daiguo [1 ]
Chen, Xiaowei [1 ]
Zhang, Ruochi [1 ,2 ,3 ]
Lei, Zengrong [1 ]
Wang, Xiaojian [4 ]
Zhou, Fengfeng [2 ,3 ]
机构
[1] Fermion Technol Co Ltd, Guangzhou 510000, Guangdong, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
[4] Peking Union Med Coll & Chinese Acad Med Sci, Inst Mat Med, State Key Lab Bioact Subst & Funct Nat Med, Beijing 100050, Peoples R China
关键词
MACHINE LEARNING-METHODS; QSAR; VALIDATION; REGRESSION; TOOL;
D O I
10.1021/acs.jcim.0c01489
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.
引用
收藏
页码:2697 / 2705
页数:9
相关论文
共 61 条
  • [1] Ligand-Based Virtual Screening Using Bayesian Networks
    Abdo, Ammar
    Chen, Beining
    Mueller, Christoph
    Salim, Naomie
    Willett, Peter
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (06) : 1012 - 1020
  • [2] Contemporary QSAR classifiers compared
    Bruce, Craig L.
    Melville, James L.
    Pickett, Stephen D.
    Hirst, Jonathan D.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (01) : 219 - 227
  • [3] A multi-scaled approach for simulating chemical reaction systems
    Burrage, K
    Tian, TH
    Burrage, P
    [J]. PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2004, 85 (2-3) : 217 - 234
  • [4] PubChem and ChEMBL beyond Lipinski
    Capecchi, Alice
    Awale, Mahendra
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. MOLECULAR INFORMATICS, 2019, 38 (05)
  • [5] QSAR Modeling: Where Have You Been? Where Are You Going To?
    Cherkasov, Artem
    Muratov, Eugene N.
    Fourches, Denis
    Varnek, Alexandre
    Baskin, Igor I.
    Cronin, Mark
    Dearden, John
    Gramatica, Paola
    Martin, Yvonne C.
    Todeschini, Roberto
    Consonni, Viviana
    Kuz'min, Victor E.
    Cramer, Richard
    Benigni, Romualdo
    Yang, Chihae
    Rathman, James
    Terfloth, Lothar
    Gasteiger, Johann
    Richard, Ann
    Tropsha, Alexander
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) : 4977 - 5010
  • [6] Chithrananda S., 2020, ARXIV PREPRINT ARXIV
  • [7] Dasoulas G., 2019, ARXIV PREPRINT ARXIV
  • [8] Recent Updates in the Computer Aided Drug Design Strategies for the Discovery of Agonists and Antagonists of Adenosine Receptors
    Deb, Pran Kishore
    [J]. CURRENT PHARMACEUTICAL DESIGN, 2019, 25 (07) : 747 - 749
  • [9] Computational methods in developing quantitative structure-activity relationships (QSAR):: A review
    Dudek, AZ
    Arodz, T
    Gálvez, J
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2006, 9 (03) : 213 - 228
  • [10] Novel 2D fingerprints for ligand-based virtual screening
    Ewing, Todd
    Baber, J. Christian
    Feher, Miklos
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2423 - 2431