Toward Automated Interpretation of LC-MS Data for Quality Assurance of a Screening Collection

被引:1
作者
Addison, Daniel H. [1 ]
机构
[1] AstraZeneca, Screening Sci & Sample Management, Bldg 310,Cambridge Sci Pk,Milton Rd, Cambridge CB4 0FZ, England
来源
JALA | 2016年 / 21卷 / 06期
关键词
LC-MS; data mining; random forests; WEKA; Pipeline Pilot; DECISION TREE CLASSIFICATION; MASS-SPECTROMETRY; PROTEINS;
D O I
10.1177/2211068215620765
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The AstraZeneca Compound Management group uses high-performance liquid chromatography-mass spectrometry for structure elucidation and purity determination of the AstraZeneca compound collection. These activities are conducted in a high-throughput environment where the rate-limiting step is the review and interpretation of analytical results, which is time-consuming and experience dependent. Despite the development of a semiautomated review system, manual interpretation of results remains a bottleneck. Data-mining techniques were applied to archived data to further automate the review process. Various classification models were evaluated using WEKA and Pipeline Pilot (Pipeline Pilot version 8.5.0.200, BIOVIA, San Diego, CA). Results were assessed using criteria including precision, recall, and receiver operating characteristic area. Each model was evaluated as a cost-insensitive classifier and again using MetaCost to apply cost sensitivity. Pruning and variable importance were also investigated. A 10-tree random forest generated with Pipeline Pilot reduced the number of analyses requiring manual review to 36.4% using a threshold of 90% confidence in predictions. This represents a 45% reduction in manual reviews compared with the previous system, delivering an annual savings of $45,000 or an increase in capacity from 25,000 analyses per month up to 45,000 with the same resource levels.
引用
收藏
页码:743 / 755
页数:13
相关论文
共 31 条
[1]  
[Anonymous], COST SENSITIVE CLASS
[2]   Automatic Quality Assessment of Peptide Tandem Mass Spectra [J].
Bern, Marshall ;
Goldberg, David ;
McDonald, W. Hayes ;
Yates, John R., III .
BIOINFORMATICS, 2004, 20 :49-54
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Drug design by machine learning: support vector machines for pharmaceutical data analysis [J].
Burbidge, R ;
Trotter, M ;
Buxton, B ;
Holden, S .
COMPUTERS & CHEMISTRY, 2001, 26 (01) :5-14
[6]   Capture and Exploration of Sample Quality Data to Inform and Improve the Management of a Screening Collection [J].
Charles, Isabel ;
Sinclair, Ian ;
Addison, Daniel H. .
JALA, 2014, 19 (02) :198-207
[7]   Applications of LC/MS in structure identifications of small molecules and proteins in drug discovery [J].
Chen, Guodong ;
Pramanik, Birendra N. ;
Liu, Yan-Hui ;
Mirza, Urooj A. .
JOURNAL OF MASS SPECTROMETRY, 2007, 42 (03) :279-287
[8]  
Comley J., 2004, DRUG DISCOV WORLD, P43
[9]   Mass spectrometry-based metabolomics [J].
Dettmer, Katja ;
Aronov, Pavel A. ;
Hammock, Bruce D. .
MASS SPECTROMETRY REVIEWS, 2007, 26 (01) :51-78
[10]  
Domingos P., 1999, 5 ACM SIGKDD INT C K, P155, DOI DOI 10.1145/312129.312220