On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds

被引:49
作者
Blower, P
Fligner, M
Verducci, J
Bjoraker, J
机构
[1] Leadscope Inc, Columbus, OH 43212 USA
[2] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 02期
关键词
D O I
10.1021/ci0101049
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.
引用
收藏
页码:393 / 404
页数:12
相关论文
共 20 条
  • [1] SOME PRACTICAL CONSIDERATIONS AND APPLICATIONS OF THE NATIONAL-CANCER-INSTITUTE IN-VITRO ANTICANCER DRUG DISCOVERY SCREEN
    BOYD, MR
    PAULI, KD
    [J]. DRUG DEVELOPMENT RESEARCH, 1995, 34 (02) : 91 - 109
  • [2] Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
  • [3] Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors
    Chen, X
    Rusinko, A
    Young, SS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06): : 1054 - 1062
  • [4] Automated pharmacophore identification for large chemical data sets
    Chen, X
    Rusinko, A
    Tropsha, A
    Young, SS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (05): : 887 - 896
  • [5] Binary formal inference-based recursive modeling using multiple atom and physicochemical property class pair and torsion descriptors as decision criteria
    Cho, SJ
    Shen, CF
    Hermsmeier, MA
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (03): : 668 - 680
  • [6] Dorigo M., 1997, IEEE Transactions on Evolutionary Computation, V1, P53, DOI 10.1109/4235.585892
  • [7] FOYE WE, 1993, CANC CHEMOTHERAPEUTI
  • [8] GOBBI A, 1997, 1 INT EL C SYNTH ORG
  • [9] Hawkins D., 1982, TOPICS APPL MULTIVAR, P269
  • [10] Analysis of a large structure-activity data set using recursive partitioning
    Hawkins, DM
    Young, SS
    Rusinko, A
    [J]. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1997, 16 (04): : 296 - 302