SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA

被引:121
|
作者
DOWNS, GM
WILLETT, P
FISANICK, W
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
[2] CHEM ABSTRACTS SERV INC,COLUMBUS,OH 43210
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1994年 / 34卷 / 05期
关键词
D O I
10.1021/ci00021a011
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Previous work on the clustering of chemical-structure databases has focused on the use of intermolecular similarity measures that are based on structural features of various kinds. In this paper, we report nearest-neighbor searching and clustering experiments with a set of 5982 molecules, each of which is characterized by 13 calculated global molecular properties. The nearest-neighbor algorithm is an upperbound procedure that uses the triangle inequality to minimize the number of distance calculations that need to be carried out when searching for nearest neighbors in metric spaces. Our experiments suggest that it performs well when small numbers of nearest neighbors are required, but that the basic ''brute-force'' procedure is best when large numbers are needed, such as when clustering is to be carried out. The clustering methods tested are the Ward and group-average hierarchic agglomerative methods, the minimum-diameter polythetic hierarchic divisive method, and the Jarvis-Patrick nearest-neighbor method. Our experiments suggest that the first three methods, which gave similar results, are the best methods for clustering molecules characterized by property data. The Jarvis-Patrick method, which has been extensively used for clustering molecules characterized by structural fragments, was not as effective as these other methods.
引用
收藏
页码:1094 / 1102
页数:9
相关论文
共 50 条
  • [31] ANALYSIS OF CHEMICAL-STRUCTURE - BIOLOGICAL-ACTIVITY RELATIONSHIPS USING CLUSTERING METHODS - COMMENT
    GLESER, LJ
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1991, 10 (1-2) : 85 - 86
  • [32] EFFICIENT DESIGN FOR CHEMICAL-STRUCTURE SEARCHING .2. FILE ORGANIZATION
    HODES, L
    FELDMAN, A
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1978, 18 (02): : 96 - 100
  • [33] Similarity and dissimilarity methods for processing chemical structure databases
    Gillet, VJ
    Wild, DJ
    Willett, P
    Bradshaw, J
    COMPUTER JOURNAL, 1998, 41 (08) : 547 - 558
  • [34] An index data structure for searching in metric space databases
    Uribe, Roberto
    Navarro, Gonzalo
    Barrientos, Ricardo J.
    Marin, Mauricio
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 1, PROCEEDINGS, 2006, 3991 : 611 - 617
  • [35] Molecular similarity searching using inference network
    Abdo, Ammar
    Salim, Naomie
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2009, 237
  • [36] CORRELATION BETWEEN PRIMARY CHEMICAL-STRUCTURE AND PROPERTY PHENOMENA IN POLYCONDENSATES
    SOMMER, K
    BATOULIS, J
    JILGE, W
    MORBITZER, L
    PITTEL, B
    PLAETSCHKE, R
    REUTER, K
    TIMMERMANN, R
    BINDER, K
    PAUL, W
    GENTILE, FT
    HEERMANN, DW
    KREMER, K
    LASO, M
    SUTER, UW
    LUDOVICE, PJ
    ADVANCED MATERIALS, 1991, 3 (12) : 590 - 599
  • [37] Advanced exact structure searching in large databases of chemical compounds
    Trepalin, SV
    Skorenko, AV
    Balakin, KV
    Nasonov, AF
    Lang, SA
    Ivashchenko, AA
    Savchuk, NP
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (03): : 852 - 860
  • [38] Current strategies for searching through structure and chemical compound databases
    Fic, Grzegorz
    Skomra, Mariusz
    Debska, Barbara
    CHEMIK, 2016, 70 (08): : 415 - 418
  • [39] Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance
    Bender, A
    Mussa, HY
    Glen, RC
    Reiling, S
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05): : 1708 - 1718
  • [40] FRONT END SOFTWARE FOR CHEMICAL-STRUCTURE SEARCHING - A STATE-OF-THE-ART REVIEW
    WARR, WA
    WILKINS, MP
    ONLINE, 1992, 16 (01): : 48 - 55