Monte Carlo method for identification of outlier molecules in QSAR studies

被引:0
作者
Tarko Laszlo
机构
[1] Center of Organic Chemistry “C. D. Nenitzescu”–Romanian Academy,
来源
Journal of Mathematical Chemistry | 2010年 / 47卷
关键词
Monte Carlo; Outliers; Qsar;
D O I
暂无
中图分类号
学科分类号
摘要
The paper presents some difficulties that appear in the application of the classical formula in the identification of “outliers” in a given objects set. The paper proposes a new Monte Carlo-like method for the identification of “outliers” in the calibration set used in QSPR/QSAR computations. Sub-sets of molecules are randomly extracted thousands of times from the given calibration set. The method relies on the idea that the presence of “outlier” molecules in a certain sub-set decreases the prediction power of the QSAR equation that used this particular sub-set of molecules. The presence of “outlier” molecules often leads to poor quality QSAR equations and rarely to high quality QSAR equations. The paper proposes a specific formula for “outlier index”. The molecule with the highest value of the outlier index is eliminated out of the calibration set. The identification/elimination process is repeated until the maximum value of the outlier index stops decreasing. The paper presents five examples of outliers’ identification using various kinds of calibration sets. We compare the results with the results obtained by a classical outlier index formula, using the same calibration set, the same set of descriptors and the same outlier identification/elimination procedure.
引用
收藏
页码:174 / 190
页数:16
相关论文
共 90 条
[1]  
Barnett V.(1993)Communications in statistics Theory Methods 22 2703-undefined
[2]  
Roberts D.(2000)undefined Comput. Stat. & Data Anal. 33 249-undefined
[3]  
Carling K.(1998)undefined Comput. Intell. Finan. Eng. (CIFEr) 29 212-undefined
[4]  
Kremer M.B.(2006)undefined Clin. Chim. Acta 372 94-undefined
[5]  
Martin R.D.(2005)undefined Metrologia 42 32-undefined
[6]  
Zhou Q.(2004)undefined Informatica 15 399-undefined
[7]  
Li S.(1986)undefined J. Royal Stat. Soc. (B) 48 39-undefined
[8]  
Li X.(1972)undefined J. Royal Stat. Soc. (B) 34 350-undefined
[9]  
Wang W.(1994)undefined Psychometrika 59 485-undefined
[10]  
Wang Z.(1995)undefined Ind. J. Stat. 57 299-undefined