Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

被引:0
作者
da Silva, Rildo Pinto [1 ,2 ]
Pollettini, Juliana Tarossi [2 ]
Pazin Filho, Antonio [1 ]
机构
[1] Rua Aimbere 233,Apto 21, BR-05018010 Sao Paulo, SP, Brazil
[2] Univ Sao Paulo, Fac Med Ribeirao Preto, Ribeirao Preto, Brazil
来源
CADERNOS DE SAUDE PUBLICA | 2023年 / 39卷 / 11期
关键词
COVID-19; Natural Language Processing; Health Care; Selection Criteria; Proprietary Health Facilities; KNOWLEDGE DISCOVERY; FUTURE;
D O I
10.1590/0102-311XPT243722
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Patients with post-COVID-19 syndrome ben-efit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandem-ics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients sus-pected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic al-gorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more se-vere patients - average cost per prior authoriza-tions paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authori-zations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional re-search model with structured language and identi-fied other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an ex-ploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health in-formation by machine learning.
引用
收藏
页数:28
相关论文
共 43 条
[1]   SARS-CoV-2 and the pandemic of COVID-19 [J].
Adil, Md Tanveer ;
Rahman, Rumana ;
Whitelaw, Douglas ;
Jain, Vigyan ;
Al-Taan, Omer ;
Rashid, Farhan ;
Munasinghe, Aruna ;
Jambulingam, Periyathambi .
POSTGRADUATE MEDICAL JOURNAL, 2021, 97 (1144) :110-116
[2]  
Agencia Nacional de Sande Suplementar, 2011, Resolucao Normativa no 259, de 17 de junho de 2011. Dispoe sobre a garantia de atendimento dos beneficiarios de plano privado de assistencia a sande e altera a Instrucao Normativa - IN no 23, de 1o de dezembro de 2009, da Diretoria de Normas e Habilitacao dos Produtos DIPRO
[3]  
Agencia Nacional de Sande Suplementar, TISS - padrao para troca de informacao de sande suplementar
[4]  
Alghamdi R, 2015, INT J ADV COMPUT SC, V6, P147
[5]  
Alloghani M., 2020, Supervised and Unsupervised Learning for Data Science, P3, DOI [10.1007/978-3-030-22475-2, 10.1007/978-3-030-22475-2_1, DOI 10.1007/978-3-030-22475-2_1]
[6]  
[Anonymous], ATTENTION IS ALL YOU
[7]   Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing [J].
Chen, Qingyu ;
Leaman, Robert ;
Allot, Alexis ;
Luo, Ling ;
Wei, Chih-Hsuan ;
Yan, Shankai ;
Lu, Zhiyong .
ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 4, 2021, 4 :313-339
[8]   Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods [J].
Chen, Tao ;
Dredze, Mark ;
Weiner, Jonathan P. ;
Hernandez, Leilani ;
Kimura, Joe ;
Kharrazi, Hadi .
JMIR MEDICAL INFORMATICS, 2019, 7 (01)
[9]  
Cios KJ, 2005, ADV INFO KNOW PROC, P1, DOI 10.1007/1-84628-183-0_1
[10]   Long covid-mechanisms, risk factors, and management [J].
Crook, Harry ;
Raza, Sanara ;
Nowell, Joseph ;
Young, Megan ;
Edison, Paul .
BMJ-BRITISH MEDICAL JOURNAL, 2021, 374