SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA

被引:121
|
作者
DOWNS, GM
WILLETT, P
FISANICK, W
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
[2] CHEM ABSTRACTS SERV INC,COLUMBUS,OH 43210
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1994年 / 34卷 / 05期
关键词
D O I
10.1021/ci00021a011
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Previous work on the clustering of chemical-structure databases has focused on the use of intermolecular similarity measures that are based on structural features of various kinds. In this paper, we report nearest-neighbor searching and clustering experiments with a set of 5982 molecules, each of which is characterized by 13 calculated global molecular properties. The nearest-neighbor algorithm is an upperbound procedure that uses the triangle inequality to minimize the number of distance calculations that need to be carried out when searching for nearest neighbors in metric spaces. Our experiments suggest that it performs well when small numbers of nearest neighbors are required, but that the basic ''brute-force'' procedure is best when large numbers are needed, such as when clustering is to be carried out. The clustering methods tested are the Ward and group-average hierarchic agglomerative methods, the minimum-diameter polythetic hierarchic divisive method, and the Jarvis-Patrick nearest-neighbor method. Our experiments suggest that the first three methods, which gave similar results, are the best methods for clustering molecules characterized by property data. The Jarvis-Patrick method, which has been extensively used for clustering molecules characterized by structural fragments, was not as effective as these other methods.
引用
收藏
页码:1094 / 1102
页数:9
相关论文
共 50 条
  • [41] CHEMICAL-STRUCTURE AND THE IMMUNOSTIMULATING ACTIVITY OF HIGH MOLECULAR POLYSACCHARIDES
    SINILOVA, NG
    DUPLISCHEVA, AP
    ROMASHEVSKAYA, EI
    MYSYAKIN, EB
    TUMANYAN, MA
    VOPROSY MEDITSINSKOI KHIMII, 1987, 33 (06): : 103 - 107
  • [42] CHEMICAL-STRUCTURE PROCESSING USING GENETIC ALGORITHMS
    JONES, G
    WILLETT, P
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1993, 206 : 1 - CINF
  • [43] IMPLEMENTATION OF NEAREST-NEIGHBOR SEARCHING IN AN ONLINE CHEMICAL-STRUCTURE SEARCH SYSTEM
    WILLETT, P
    WINTERMAN, V
    BAWDEN, D
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1986, 26 (01): : 36 - 41
  • [44] NEW DATA ON THE CHEMICAL-STRUCTURE OF LIPOPOLYSACCHARIDES AND PRACTICAL PROSPECTS
    YAROVAYA, LM
    ALESHKIN, VA
    ZHURNAL MIKROBIOLOGII EPIDEMIOLOGII I IMMUNOBIOLOGII, 1991, (03): : 73 - 78
  • [45] TOWARDS A STANDARD INTERCHANGE FORMAT FOR CHEMICAL-STRUCTURE DATA
    BARNARD, JM
    ONLINE INFORMATION 88, PROCEEDINGS VOLS 1-2, 1988, : 605 - 609
  • [46] Structure searching using smiles and relational databases.
    Sayle, RA
    Delany, JJ
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2001, 222 : U268 - U268
  • [47] Chemical similarity using physiochemical property descriptors
    Kearsley, SK
    Sallamack, S
    Fluder, EM
    Andose, JD
    Mosley, RT
    Sheridan, RP
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (01): : 118 - 127
  • [48] CLIP: Similarity searching of 3D databases using clique detection
    Rhodes, N
    Willett, P
    Calvet, A
    Dunbar, JB
    Humblet, C
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02): : 443 - 448
  • [49] SiteMine: Large-scale binding site similarity searching in protein structure databases
    Reim, Thorben
    Ehrt, Christiane
    Graef, Joel
    Guenther, Sebastian
    Meents, Alke
    Rarey, Matthias
    ARCHIV DER PHARMAZIE, 2024, 357 (05)
  • [50] DEVELOPMENT OF AN ATOM MAPPING PROCEDURE FOR SIMILARITY SEARCHING IN DATABASES OF 3-DIMENSIONAL CHEMICAL STRUCTURES
    PEPPERRELL, CA
    POIRRETTE, AR
    WILLETT, P
    TAYLOR, R
    PESTICIDE SCIENCE, 1991, 33 (01): : 97 - 111