Machine Learning-Based Approach for Identifying Research Gaps: COVID-19 as a Case Study

被引:4
作者
Abd-alrazaq, Alaa [1 ]
Nashwan, Abdulqadir J. [2 ]
Shah, Zubair [3 ]
Abujaber, Ahmad [4 ]
Alhuwail, Dari [5 ,6 ]
Schneider, Jens [3 ]
AlSaad, Rawan [1 ]
Ali, Hazrat [7 ]
Alomoush, Waleed [8 ]
Ahmed, Arfan [1 ]
Aziz, Sarah [1 ]
机构
[1] Weill Cornell Med Qatar, AI Ctr Precis Hlth, A031,A1 Luqta St, Doha 23435, Qatar
[2] Hamad Med Corp, Dept Nursing, Doha, Qatar
[3] Hamad Bin Khalifa Univ, Coll Sci & Engn, Div Informat & Comp Technol, Doha, Qatar
[4] Hamad Med Corp, Nursing Dept, Doha, Qatar
[5] Kuwait Univ, Coll Life Sci, Informat Sci Dept, Kuwait, Kuwait
[6] Dasman Diabet Inst, Hlth Informat Unit, Kuwait, Kuwait
[7] Sohar Univ, Fac Comp & Informat Technol, Sohar, Oman
[8] Skyline Univ Coll, Sch Informat Technol, Sharjah, U Arab Emirates
关键词
research gaps; research gap; research topic; research topics; scientific literature; literature review; machine learning; COVID-19; BERTopic; topic clustering; text analysis; BERT; NLP; natural language processing; review methods; review methodology; SARS-CoV-2; coronavirus; Covid;
D O I
10.2196/49411
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time -sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest. Objective: In this paper, we propose a machine learning-based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study. Methods: We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD -19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class -based term frequency -inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance). Results: After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: "virus of COVID-19," "risk factors of COVID-19," "prevention of COVID-19," "treatment of COVID-19," "health care delivery during COVID-19," "and impact of COVID-19." The most prominent topic, observed in over half of the analyzed studies, was "the impact of COVID-19." Conclusions: The proposed machine learning-based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.
引用
收藏
页数:12
相关论文
共 33 条
[1]   A Comprehensive Overview of the COVID-19 Literature: Machine Learning-Based Bibliometric Analysis [J].
Abd-Alrazaq, Alaa ;
Schneider, Jens ;
Mifsud, Borbala ;
Alam, Tanvir ;
Househ, Mowafa ;
Hamdi, Mounir ;
Shah, Zubair .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
[2]   Artificial Intelligence in the Fight Against COVID-19: Scoping Review [J].
Abd-Alrazaq, Alaa ;
Alajlani, Mohannad ;
Alhuwail, Dari ;
Schneider, Jens ;
Al-Kuwari, Saif ;
Shah, Zubair ;
Hamdi, Mounir ;
Househ, Mowafa .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (12)
[3]   Topic modeling algorithms and applications: A survey [J].
Abdelrazek, Aly ;
Eid, Yomna ;
Gawish, Eman ;
Medhat, Walaa ;
Hassan, Ahmed .
INFORMATION SYSTEMS, 2023, 112
[4]  
Allen Institute for AI, 2022, COVID-19 Open Research Dataset Challenge (CORD-19)
[5]  
[Anonymous], 2020, COORDINATED GLOBAL R
[6]   Public health research priorities for WHO on COVID-19 in the South-East Asia Region: results of a prioritization survey [J].
Azim, Tasnim ;
Bhushan, Anjana ;
Del Rio Vilas, Victor J. ;
Srivastava, Rahul ;
Wijesinghe, Pushpa Ranjan ;
Ofrin, Roderico ;
Chauhan, Sharat ;
Krishnan, Anand .
HEALTH RESEARCH POLICY AND SYSTEMS, 2022, 20 (01)
[7]  
Borowka S., 2024, arXiv, DOI [10.31222/osf.io/x6aut, DOI 10.1111/CGF.13815]
[8]   Early survey with bibliometric analysis on machine learning approaches in controlling COVID-19 outbreaks [J].
Chiroma, Haruna ;
Ezugwu, Absalom E. ;
Jauro, Fatsuma ;
Al-Garadi, Mohammed A. ;
Abdullahi, Idris N. ;
Shuib, Liyana .
PEERJ COMPUTER SCIENCE, 2020,
[9]  
El Naqa I., 2015, Machine Learning in Radiation Oncology, P3, DOI [DOI 10.1007/978-3-319-18305-3, DOI 10.1007/978-3-319-18305-3_1, 10.1007/978-3-319-18305-3_1]
[10]   How to Conduct Scientific Research? [J].
Erol, Almila .
NOROPSIKIYATRI ARSIVI-ARCHIVES OF NEUROPSYCHIATRY, 2017, 54 (02) :97-98