Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers

被引:26
作者
Tyzack, Jonathan D. [1 ]
Mussa, Hamse Y. [1 ]
Williamson, Mark J. [1 ]
Kirchmair, Johannes [2 ]
Glen, Robert C. [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
[2] ETH, Dept Chem & Appl Biosci, Inst Pharmaceut Sci, CH-8093 Zurich, Switzerland
来源
JOURNAL OF CHEMINFORMATICS | 2014年 / 6卷
关键词
Cytochrome P450; Metabolism; Probabilistic; Classification; GPU; CUDA; 2D; XENOBIOTIC METABOLISM; ACTIVATION-ENERGIES; DRUG-METABOLISM; RS-PREDICTOR; TOOL; REGIOSELECTIVITY; SMARTCYP; PK(A);
D O I
10.1186/1758-2946-6-29
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository. Results: It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively. Conclusions: 2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme.
引用
收藏
页数:14
相关论文
共 48 条
  • [1] [Anonymous], 2008, P 25 INT C MACHINE L, DOI DOI 10.1145/1390156.1390170
  • [2] [Anonymous], 2003, Statistical pattern recognition
  • [3] [Anonymous], 1973, Pattern Classification and Scene Analysis
  • [4] A View of the Parallel Computing Landscape
    Asanovic, Krste
    Bodik, Rastislav
    Demmel, James
    Keaveny, Tony
    Keutzer, Kurt
    Kubiatowicz, John
    Morgan, Nelson
    Patterson, David
    Sen, Koushik
    Wawrzynek, John
    Wessel, David
    Yelick, Katherine
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 56 - 67
  • [5] A new parallel tool for classification of remotely sensed imagery
    Bernabe, Sergio
    Plaza, Antonio
    Marpu, Prashanth Reddy
    Benediktsson, Jon Atli
    [J]. COMPUTERS & GEOSCIENCES, 2012, 46 : 208 - 218
  • [6] Bishop CM., 1995, NEURAL NETWORKS PATT
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Development of a Computational Tool to Rival Experts in the Prediction of Sites of Metabolism of Xenobiotics by P450s
    Campagna-Slater, Valerie
    Pottel, Joshua
    Therrien, Eric
    Cantin, Louis-David
    Moitessier, Nicolas
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (09) : 2471 - 2483
  • [9] Ekins Sean, 2005, Expert Opin Drug Metab Toxicol, V1, P303, DOI 10.1517/17425255.1.2.303
  • [10] Cytochrome P450s and other enzymes in drug metabolism and toxicity
    Guengerich, FP
    [J]. AAPS JOURNAL, 2006, 8 (01) : E101 - E111