Comparison of Nonbinary Similarity Coefficients for Similarity Searching, Clustering and Compound Selection

被引:33
作者
Al Khalifa, Aysha [1 ]
Haranczyk, Maciej [1 ]
Holliday, John [1 ]
机构
[1] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
关键词
CHEMICAL SIMILARITY; DATA FUSION; DISSIMILARITY; COMBINATION; STRINGS;
D O I
10.1021/ci8004644
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Several recent studies have compared the relative performance of a selection of similarity coefficients when applied to chemical databases represented by binary fingerprints. Considerable variation in performance, when used for (dis)similarity-based techniques, such as similarity searching, database clustering, and dissimilarity-based compound selection, has been reported, the reasons for which are closely related to molecular size. For many of these similarity coefficients, an alternative form can be derived which is applicable to sets of nonbinary data, such as calculated or measured physicochemical properties, or counts of substructural fragments. Here we report on several studies which have been undertaken to investigate the relative performance of twelve coefficients when applied to nonbinary data using such (dis)similarity-based techniques. Results suggest that no single coefficient is appropriate for all methodologies investigated and that the size bias detected with binary data is not as apparent when the data and, hence, coefficient are nonbinary in nature.
引用
收藏
页码:1193 / 1201
页数:9
相关论文
共 23 条
[1]  
[Anonymous], 1990, M 196 1988 LOS ANG C
[2]   SIMILARITY OF BINARY DATA [J].
BARONIURBANI, C ;
BUSER, MW .
SYSTEMATIC ZOOLOGY, 1976, 25 (03) :251-259
[3]   A Machine Learning Approach to Weighting Schemes in the Data Fusion of Similarity Coefficients [J].
Chen, Jenny ;
Holliday, John ;
Bradshaw, John .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (02) :185-194
[4]   Effect of Data Standardization on Chemical Clustering and Similarity Searching [J].
Chu, Chia-Wei ;
Holliday, John D. ;
Willett, Peter .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (02) :155-161
[5]  
*EDUSOFT LC, 2006, MOLC Z VERS 4 0
[6]  
Ellis D., 1994, PERSPECT INF MANAG, V3, P128
[7]   A modification of the Jaccard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings [J].
Fligner, MA ;
Verducci, JS ;
Blower, PE .
TECHNOMETRICS, 2002, 44 (02) :110-119
[8]   Combination of molecular similarity measures using data fusion [J].
Ginn, CMR ;
Willett, P ;
Bradshaw, J .
PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 2000, 20 (01) :1-16
[9]   Comparison of similarity coefficients for clustering and compound selection [J].
Haranczyk, Maciej ;
Holliday, John .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (03) :498-508
[10]   Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures [J].
Hert, J ;
Willett, P ;
Wilton, DJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (03) :1177-1185