Ligand Prediction from Protein Sequence and Small Molecule Information Using Support Vector Machines and Fingerprint Descriptors

被引:46
作者
Geppert, Hanna [1 ]
Humrich, Jens [2 ]
Stumpfe, Dagmar [1 ]
Gaertner, Thomas [2 ]
Bajorath, Juergen [1 ]
机构
[1] Rhein Freidrich Wilhelms Univ Bonn, Dept Life Sci Informat, B IT, LIMES Program Unit Chem Biol & Med Chem, D-53113 Bonn, Germany
[2] Fraunhofer Inst Intelligent Anal & Informat Syst, D-53754 St Augustin, Germany
关键词
AIDED CHEMICAL BIOLOGY; DRUG DISCOVERY; SIMILARITY; DESIGN; SELECTIVITY; DATABASE; 2D; CHEMOGENOMICS; INHIBITION; BINDINGDB;
D O I
10.1021/ci900004a
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Support vector machine (SVM) database search strategies are presented that aim at the identification of small molecule ligands for targets for which no ligand information is currently available. In pharmaceutical research and chemical biology, this situation is faced, for example, when studying orphan targets or newly identified members of protein families. To investigate methods for de novo ligand identification in the absence of known three-dimensional target structures or active molecules, we have focused on combining sequence and ligand information for closely and distantly related proteins. To provide a basis for these investigatiors, a set of I I protease targets from different families was assembled together with more than 2000 inhibitors directed against individual proteases. We have compared SVM approaches that combine protein sequence and ligand information in different ways and utilize 2D fingerprints as ligand descriptors. These methodologies were applied to search for inhibitors of individual proteases not taken into account during learning. A target sequence-ligand kernel and, in particular, a linear combination of multiple target-directed SVMs consistently identified inhibitors with high accuracy including test cases where homology-based similarity searching using data fusion and conventional SVM ranking nearly or completely failed. The SVM linear combination and target-ligand kernel methods described herein are intuitive and straightforward to adopt for ligand prediction against other targets.
引用
收藏
页码:767 / 779
页数:13
相关论文
共 54 条
  • [1] [Anonymous], 2005, MACCS STRUCT KEYS
  • [2] [Anonymous], 1999, Advances in kernel methods: Support vector learning
  • [3] [Anonymous], 2003, SIGKDD Explorations, DOI DOI 10.1145/959242.959248
  • [4] [Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
  • [5] THEORY OF REPRODUCING KERNELS
    ARONSZAJN, N
    [J]. TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 68 (MAY) : 337 - 404
  • [6] Computational analysis of ligand relationships within target families
    Bajorath, Juergen
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2008, 12 (03) : 352 - 358
  • [7] Computational approaches in chemogenomics and chemical biology: current and future impact on drug discovery
    Bajorath, Juergen
    [J]. EXPERT OPINION ON DRUG DISCOVERY, 2008, 3 (12) : 1371 - 1376
  • [8] Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance
    Bender, A
    Mussa, HY
    Glen, RC
    Reiling, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05): : 1708 - 1718
  • [9] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [10] Virtual screen for ligands of orphan G protein-coupled receptors
    Bock, JR
    Gough, DA
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (05) : 1402 - 1414