Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing

被引:0
作者
Pendyala, Vishnu S. [1 ]
Kamdar, Karnavee [2 ]
Mulchandani, Kapil [3 ]
机构
[1] San Jose State Univ, Dept Appl Data Sci, San Jose, CA 95192 USA
[2] Oracle, Austin, TX 78741 USA
[3] Amazon, Seattle, WA 98170 USA
关键词
machine learning; peer review; large language models; long short-term memory; support vector machines; natural language processing; BIAS;
D O I
10.3390/electronics14020256
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and recommends reviewers. It helps the authors and editors to evaluate research work using machine learning models developed based on a dataset comprising 18,000+ research papers, some of which are from highly acclaimed, top conferences in Artificial Intelligence such as NeurIPS and ICLR, their reviews, aspect scores, and accept/reject decisions. Using machine learning algorithms such as Support Vector Machines, Deep Learning Recurrent Neural Network architectures such as LSTM, a wide variety of pre-trained word vectors using Word2Vec, GloVe, FastText, transformer architecture-based BERT, DistilBERT, Google's Large Language Model (LLM), PaLM 2, and TF-IDF vectorizer, a comprehensive system is built. For the system to be readily usable and to facilitate future enhancements, a frontend, a Flask server in the cloud, and a NOSQL database at the backend are implemented, making it a complete system. The work is novel in using a unique blend of tools and techniques to address most aspects of building a system to support the peer review process. The experiments result in a 86% test accuracy on acceptance prediction using DistilBERT. Results from other models are comparable, with PaLM-based LLM embeddings achieving 84% accuracy.
引用
收藏
页数:26
相关论文
共 48 条
[1]  
Anil R, 2023, Arxiv, DOI [arXiv:2305.10403, DOI 10.48550/ARXIV.2305.10403, 10.48550/arXiv.2305.10403]
[2]  
Anjum O, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P518
[3]   Predicting Paper Acceptance via Interpretable Decision Sets [J].
Bao, Peng ;
Hong, Weihui ;
Li, Xuanya .
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, :461-465
[4]   PEERRec: An AI-based approach to automatically generate recommendations and predict decisions in peer review [J].
Bharti, Prabhat Kumar ;
Ghosal, Tirthankar ;
Agarwal, Mayank ;
Ekbal, Asif .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2024, 25 (01) :55-72
[5]  
Bojanowski P., 2017, P P 2017 C EMPIRICAL, P2369
[6]   Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases [J].
Bornmann, Lutz ;
Haunschild, Robin ;
Mutz, Rudiger .
HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2021, 8 (01)
[7]   Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references [J].
Bornmann, Lutz ;
Mutz, Ruediger .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (11) :2215-2222
[8]  
Charlin L., 2013, P INT C MACH LEARN I
[9]  
Charlin Laurent., 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, P86
[10]  
Chowdhery A, 2022, Arxiv, DOI [arXiv:2204.02311, DOI 10.48550/ARXIV.2204.02311]