A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents

被引:16
作者
Aphinyanaphongs, Yindalon [1 ]
Statnikov, Alexander [1 ]
Aliferis, Constantin F. [1 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, Eskind Biomed Lib, Discovery Syst Lab, Nashville, TN 37232 USA
关键词
D O I
10.1197/jamia.M2031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they ate evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard). Design: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors. Measurements: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation. Results: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters. Conclusions: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.
引用
收藏
页码:446 / 455
页数:10
相关论文
共 37 条
[1]  
ALIFERIS C, 2003, P AMIA S WASH DC
[2]  
ALIFERIS CF, 1908, METMBS, P371
[3]  
[Anonymous], 2003, International Journal of Clinical and Health Psychology, DOI DOI 10.1080/09515080020007599
[4]  
[Anonymous], 2005, PUBMED, pA15
[5]  
[Anonymous], 2003, HP INVENT
[6]  
[Anonymous], PRACTICAL GUIDE SUPP
[7]   Text categorization models for high-quality article retrieval in internal medicine [J].
Aphinyanaphongs, Y ;
Tsamardinos, I ;
Statnikov, A ;
Hardin, D ;
Aliferis, CF .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (02) :207-216
[8]  
Aphinyanaphongs Y., 2004, MEDINFO
[9]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[10]  
BERNSTAM EV, 2005, J AM MED INFORM ASS