Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels

被引:7
|
作者
Yang, Ping [1 ]
Henle, E. Adrian [1 ]
Fern, Xiaoli Z. [2 ]
Simon, Cory M. [1 ]
机构
[1] Oregon State Univ, Sch Chem Biol & Environm Engn, Corvallis, OR 97331 USA
[2] Oregon State Univ, Sch Elect Engn & Comp Sci, Corvallis, OR 97331 USA
来源
JOURNAL OF CHEMICAL PHYSICS | 2022年 / 157卷 / 03期
基金
美国国家科学基金会;
关键词
ACUTE CONTACT TOXICITY; PROTEIN-LIGAND DOCKING; APIS-MELLIFERA; NEONICOTINOID INSECTICIDES; PREDICTION; EXPOSURE; CLASSIFICATION; AGRICULTURE; POLLINATORS; RESISTANCE;
D O I
10.1063/5.0090573
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are valuable as pollinators. Thus, candidate pesticides in development pipelines must be assessed for toxicity to bees. Leveraging a dataset of 382 molecules with toxicity labels from honey bee exposure experiments, we train a support vector machine (SVM) to predict the toxicity of pesticides to honey bees. We compare two representations of the pesticide molecules: (i) a random walk feature vector listing counts of length-L walks on the molecular graph with each vertex- and edge-label sequence and (ii) the Molecular ACCess System (MACCS) structural key fingerprint (FP), a bit vector indicating the presence/absence of a list of pre-defined subgraph patterns in the molecular graph. We explicitly construct the MACCS FPs but rely on the fixed-length-L random walk graph kernel (RWGK) in place of the dot product for the random walk representation. The L-RWGK-SVM achieves an accuracy, precision, recall, and F1 score (mean over 2000 runs) of 0.81, 0.68, 0.71, and 0.69, respectively, on the test data set-with L = 4 being the mode optimal walk length. The MACCS-FP-SVM performs on par/marginally better than the L-RWGK-SVM, lends more interpretability, but varies more in performance. We interpret the MACCS-FP-SVM by illuminating which subgraph patterns in the molecules tend to strongly push them toward the toxic/non-toxic side of the separating hyperplane. Published under an exclusive license by AIP Publishing.
引用
收藏
页数:13
相关论文
共 14 条
  • [1] Malware analysis with graph kernels and support vector machines
    Wagner, Cynthia
    Wagener, Gerard
    State, Radu
    Engel, Thomas
    2009 4TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE 2009), 2009, : 63 - 68
  • [2] Evaluating Support Vector Machines with Multiple Kernels by Random Search
    Abe, Shigeo
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2024, 2024, 15154 : 61 - 72
  • [3] Random Walk Kernel Applications to Classification using Support Vector Machines
    Gavriilidis, Vasileios
    Tefas, Anastasios
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3898 - 3903
  • [4] Graph kernels for molecular structure-activity relationship analysis with support vector machines
    Mahé, P
    Ueda, N
    Akutsu, T
    Perret, JL
    Vert, JP
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) : 939 - 951
  • [5] Using directed acyclic graph support vector machines with tabu search for classifying faulty product types
    Pai, Ping-Feng
    Huang, Yu-Ying
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 3, PROCEEDINGS, 2006, 3973 : 1117 - 1125
  • [6] Automatic architectural style detection using one-class support vector machines and graph kernels
    Strobbe, Tiemen
    wyffels, Francis
    Verstraeten, Ruben
    De Meyer, Ronald
    Van Campenhout, Jan
    AUTOMATION IN CONSTRUCTION, 2016, 69 : 1 - 10
  • [7] iScore: An MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and support vector machines
    Renaud, Nicolas
    Jung, Yong
    Honavar, Vasant
    Geng, Cunliang
    Bonvin, Alexandre M. J. J.
    Xue, Li C.
    SOFTWAREX, 2020, 11
  • [8] Classifying multi-temporal TM imagery using Markov random fields and support vector machines
    Liu, DS
    Kelly, M
    Gong, P
    2005 International Workshop on the Analysis on Multi-Temporal Remote Sensing Images, 2005, : 225 - 228
  • [9] Fast classification for large data sets via random selection clustering and Support Vector Machines
    Li, Xiaoou
    Cervantes, Jair
    Yu, Wen
    INTELLIGENT DATA ANALYSIS, 2012, 16 (06) : 897 - 914
  • [10] On using physico-chemical properties of amino acids in string kernels for protein classification via support vector machines
    Limin Li
    Kiyoko F. Aoki-Kinoshita
    Wai-Ki Ching
    Hao Jiang
    Journal of Systems Science and Complexity, 2015, 28 : 504 - 516