ICRM: An intelligent citation recommendation mechanism based on BERT and weighted BoW models

被引:0
作者
Chang C.-Y. [1 ]
Yang Y.-T. [1 ]
Zhang Q. [1 ]
Lin Y.-T. [2 ]
Roy D.S. [3 ]
机构
[1] Department of Computer Science and Information Engineering, Tamkang University, New Taipei
[2] Department of English, Tamkang University, New Taipei
[3] Department of Computer Science and Engineering, National Institute of Technology, Shillong
关键词
BERT; Citation recommendation; TF-IDF; weighted bag of word;
D O I
10.3233/JIFS-237975
中图分类号
学科分类号
摘要
With the field of technology has witnessed rapid advancements, attracting an ever-growing community of researchers dedicated to developing theories and techniques. This paper proposes an innovative ICRM (Intelligent Citation Recommendation Mechanism), designed to automate the process of suggesting the appropriate number of citations for individual brackets within a document. The proposed ICRM comprises three phases: Coarse-grained Weighted Bag of Word (WCBW), Fine-grained SciBERT (FSB) and Citation Adjustment phases. Firstly, the WCBW phase employs TF-IDF to extract keywords from both target and candidate documents, forming vectors that capture word significance along with metadata like authorship, keywords, and titles. It aims to identify relevant papers from a database, serving as initial candidates for each bracket. Secondly, the FSB phase employs the SciBERT model to assess the similarity between candidate documents and the local context around brackets, enhancing the precision of recommendations. It refines this selection by analyzing candidate-document relationships within the proximity of the brackets. Lastly, the Citation Adjustment phase tackles overlapping citations and ensures that recommended citation numbers align with user-defined criteria, resolving issues of imbalance. The simulation results demonstrate that the proposed ICRM outperforms existing models significantly in terms of precision, recall and F1-score. © 2024 - IOS Press. All rights reserved.
引用
收藏
页码:10135 / 10150
页数:15
相关论文
共 19 条
[1]  
Iqbal S., Et al., A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies, Scientometrics, 126, 8, pp. 6551-6599, (2021)
[2]  
Yue K., Et al., Natural language processing (NLP) in management research: A literature review, Journal of Management Analytics, 7, 2, pp. 139-172, (2020)
[3]  
Qin X., Et al., Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews, Journal of Clinical Epidemiology, 133, pp. 121-129, (2021)
[4]  
Enriquez F., Jose A.T., Tomas L.S., An approach to the use of word embeddings in an opinion classification task, Expert Systems with Applications, 66, pp. 1-6, (2016)
[5]  
Costa G., Riccardo O., Jointly modeling and simultaneously discovering topics and clusters in text corpora using word vectors, Information Sciences, 563, pp. 226-240, (2021)
[6]  
Kim D., Et al., Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec, Information Sciences, 477, pp. 15-29, (2019)
[7]  
Lan F., Research on text similarity measurement hybrid algorithm with term semantic information and TF-IDF method, Advances in Multimedia, 2022, pp. 1-11, (2022)
[8]  
Lee J., Et al., Personalized academic research paper recommendation system, pp. 1-8, (2013)
[9]  
Nogueira R., Et al., Navigation-based candidate expansion and pretrained language models for citation recommendation, Scientometrics, 125, pp. 3001-3016, (2020)
[10]  
Dai T., Et al., Attentive stacked denoising autoencoder with bi-lstm for personalized context-aware citation recommendation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, pp. 553-568, (2019)