Exploring the evolution of research topics during the COVID-19 pandemic

被引:4
作者
Invernici, Francesco [1 ]
Bernasconi, Anna [1 ]
Ceri, Stefano [1 ]
机构
[1] Politecn Milan, Dept Elect Informat & Bioengn, Milan, Italy
关键词
Research data; Scientific literature; Natural language processing; Topic modeling; COVID-19; Time series;
D O I
10.1016/j.eswa.2024.124028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles-this corpus, however, does not provide an easy-to-access overview of its content. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. Overall, our pipeline is very fast and its results match our expectations on topic identification (F1-score 0.854). The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents-thus suited for effective adaptation to other contexts.
引用
收藏
页数:12
相关论文
共 61 条
[1]  
American Society for Microbiology, 2023, COVID-19 (SARS-CoV-2 coronavirus) resources
[2]  
Angelov D, 2020, Arxiv, DOI [arXiv:2008.09470, DOI 10.48550/ARXIV.2008.09470]
[3]  
Tran BX, 2020, INT J ENV RES PUB HE, V17, DOI [10.3390/ijerph17114095, 10.3390/ijerph17103577]
[4]   Topic Extraction and Interactive Knowledge Graphs for Learning Resources [J].
Badawy, Ahmed ;
Fisteus, Jesus A. ;
Mahmoud, Tarek M. ;
Abd El-Hafeez, Tarek .
SUSTAINABILITY, 2022, 14 (01)
[5]   The effect of COVID-19 on scientific publishing in Italy [J].
Berchialla, Paola ;
Urru, Sara ;
Sciannameo, Veronica .
EPIDEMIOLOGIA & PREVENZIONE, 2021, 45 (06) :449-451
[6]   Molecular characterization of SARS-CoV-2 from the first case of COVID-19 in Italy [J].
Capobianchi, M. R. ;
Rueca, M. ;
Messina, F. ;
Giombini, E. ;
Carletti, F. ;
Colavita, F. ;
Castilletti, C. ;
Lalle, E. ;
Bordi, L. ;
Vairo, F. ;
Nicastri, E. ;
Ippolito, G. ;
Gruber, C. E. M. ;
Bartolini, B. .
CLINICAL MICROBIOLOGY AND INFECTION, 2020, 26 (07) :954-956
[7]  
Ceri S., 2013, Web information retrieval, P27, DOI [10.1007/978-3-642-39314-33, DOI 10.1007/978-3-642-39314-33]
[8]   When did coronavirus arrive in Europe? [J].
Cerqua, Augusto ;
Di Stefano, Roberta .
STATISTICAL METHODS AND APPLICATIONS, 2022, 31 (01) :181-195
[9]   LitCovid: an open database of COVID-19 literature [J].
Chen, Qingyu ;
Allot, Alexis ;
Lu, Zhiyong .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D1534-D1540
[10]   Experimental explorations on short text topic mining between LDA and NMF based Schemes [J].
Chen, Yong ;
Zhang, Hui ;
Liu, Rui ;
Ye, Zhiwen ;
Lin, Jianying .
KNOWLEDGE-BASED SYSTEMS, 2019, 163 :1-13