Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature

被引:0
作者
Fu L.D. [1 ]
Aliferis C.F. [1 ]
机构
[1] Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, NY 10016, 333 E. 38th St
关键词
Bibliometrics; Citation analysis; Information retrieval; Machine learning;
D O I
10.1007/s11192-010-0160-5
中图分类号
学科分类号
摘要
The most popular method for judging the impact of biomedical articles is citation count which is the number of citations received. The most significant limitation of citation count is that it cannot evaluate articles at the time of publication since citations accumulate over time. This work presents computer models that accurately predict citation counts of biomedical publications within a deep horizon of 10 years using only predictive information available at publication time. Our experiments show that it is indeed feasible to accurately predict future citation counts with a mixture of content-based and bibliometric features using machine learning methods. The models pave the way for practical prediction of the long-term impact of publication, and their statistical analysis provides greater insight into citation behavior. © 2010 Akadémiai Kiadó, Budapest, Hungary.
引用
收藏
页码:257 / 270
页数:13
相关论文
共 14 条
[1]  
Aliferis C., Statnikov A., Et al., Challenges in the analysis of mass-throughput data, Cancer Informatics, 2, pp. 133-162, (2006)
[2]  
Aphinyanaphongs Y., Tsamardinos I., Et al., Text categorization models for high-quality article retrieval in internal medicine, Jamia, 12, 2, pp. 207-216, (2005)
[3]  
Burges C., A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 2, pp. 121-167, (1998)
[4]  
Feitelson D., Yovel U., Predictive ranking of computer scientists using CiteSeer data, Journal of Documentation, 60, 1, pp. 44-61, (2004)
[5]  
Garfield E., Can citation indexing be automated?, Essays of an Information Scientist, 1, pp. 84-90, (1962)
[6]  
Getoor L., Link mining: A new data mining challenge, SIGKDD Explorations, 5, 1, pp. 84-89, (2003)
[7]  
Gross P., Gross E., College libraries and chemical education, Science, 66, pp. 385-389, (1927)
[8]  
Leopold E., Kindermann J., Text categorization with support vector machines, Machine Learning, 46, pp. 423-444, (2002)
[9]  
Lokker C., McKibbon K.A., Et al., Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study, BMJ, (2008)
[10]  
Macroberts M., Macroberts B., Problems of citation analysis, Scientometrics, 36, 3, pp. 435-444, (1996)