Translation and Expansion: Enabling Laypeople Access to the COVID-19 Academic Collection

被引:2
作者
He D. [1 ,2 ]
Wang Z. [1 ,2 ]
Thaker K. [1 ,2 ]
Zou N. [1 ,2 ]
机构
[1] School of Computing and Information, University of Pittsburgh, Pittsburgh, PA
[2] School of Computing and Information, University of Pittsburgh, Pittsburgh, PA
关键词
consumer health vocabulary; COVID-19; information retrieval; laypeople; translation and expansion process;
D O I
10.2478/dim-2020-0011
中图分类号
学科分类号
摘要
Academic collections, such as COVID-19 Open Research Dataset (CORD-19), contain a large number of scholarly articles regarding COVID-19 and other related viruses. These articles represent the latest development in combating COVID-19 pandemic in various disciplines. However, it is difficult for laypeople to access these articles due to the term mismatch problem caused by their limited medical knowledge. In this article, we present an effort of helping laypeople to access the CORD-19 collection by translating and expanding laypeople's keywords to their corresponding medical terminology using the National Library of Medicine's Consumer Health Vocabulary. We then developed a retrieval system called Search engine for Laypeople to access the COVID-19 literature (SLAC) using open-source software. Utilizing Centers for Disease Control and Prevention's FAQ questions as the basis for developing common questions that laypeople could be interested in, we performed a set of experiments for testing the SLAC system and the translation and expansion (T&E) process. Our experiment results demonstrate that the T&E process indeed helped to overcome the term mismatch problem and mapped laypeople terms to the medical terms in the academic articles. But we also found that not all laypeople's search topics are meaningful to search on the CORD-19 collection. This indicates the scope and the limitation of enabling laypeople to search on academic article collection for obtaining high-quality information. © 2020 Daqing He et al., published by Sciendo
引用
收藏
页码:177 / 190
页数:13
相关论文
共 39 条
[1]  
Agichtein E., Brill E., Dumais S., Improving web search ranking by incorporating user behavior information, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-26, (2006)
[2]  
Bhavnani S.K., Domain-specific search strategies for the effective retrieval of healthcare and shopping information, CHI'02 Extended Abstracts on Human Factors in Computing Systems, pp. 610-611, (2002)
[3]  
Bodenreider O., The unified medical language system (UMLS): Integrating biomedical terminology, Nucleic acids research, 32, pp. D267-D270, (2004)
[4]  
Brennen J.S., Simon F.M., Howard P.N., Nielsen R.K., Types, sources, and claims of Covid-19 misinformation, Reuters Institute, (2020)
[5]  
Bullock J., Pham K.H., Lam C.S.N., Luengo-Oroz M., Mapping the landscape of artificial intelligence applications against covid-19, (2020)
[6]  
Chen E., Lerman K., Ferrara E., Covid-19: The first public coronavirus twitter dataset, (2020)
[7]  
Chi Y., He D., Jeng W., Laypeople's source selection in online health information-seeking process, Journal of the Association for Information Science and Technology, (2020)
[8]  
Cinelli M., Quattrociocchi W., Galeazzi A., Valensise C.M., Brugnoli E., Schmidt A.L., Scala A., The covid-19 social media infodemic, (2020)
[9]  
Diaz F., Mitra B., Craswell N., Query expansion with locally-trained word embeddings, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, pp. 367-377, (2016)
[10]  
Gormley C., Tong Z., Elasticsearch: The definitive guide: A distributed real-time search and analytics engine, (2015)