A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers

被引:6
作者
Elluri, Lavanya [1 ]
Piplai, Aritran [2 ]
Kotal, Anantaa [2 ]
Joshi, Anupam [2 ]
Joshi, Karuna Pande [1 ]
机构
[1] Univ Maryland Baltimore Cty, IS Dept, Baltimore, MD 21228 USA
[2] Univ Maryland Baltimore Cty, CSEE Dept, Baltimore, MD 21228 USA
来源
FRONTIERS IN BIG DATA | 2021年 / 4卷
关键词
COVID-19; knowledge graph; privacy; UMLS; HIPAA; NAMED ENTITY RECOGNITION;
D O I
10.3389/fdata.2021.701966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The entire scientific and academic community has been mobilized to gain a better understanding of the COVID-19 disease and its impact on humanity. Most research related to COVID-19 needs to analyze large amounts of data in very little time. This urgency has made Big Data Analysis, and related questions around the privacy and security of the data, an extremely important part of research in the COVID-19 era. The White House OSTP has, for example, released a large dataset of papers related to COVID research from which the research community can extract knowledge and information. We show an example system with a machine learning-based knowledge extractor which draws out key medical information from COVID-19 related academic research papers. We represent this knowledge in a Knowledge Graph that uses the Unified Medical Language System (UMLS). However, publicly available studies rely on dataset that might have sensitive data. Extracting information from academic papers can potentially leak sensitive data, and protecting the security and privacy of this data is equally important. In this paper, we address the key challenges around the privacy and security of such information extraction and analysis systems. Policy regulations like HIPAA have updated the guidelines to access data, specifically, data related to COVID-19, securely. In the US, healthcare providers must also comply with the Office of Civil Rights (OCR) rules to protect data integrity in matters like plasma donation, media access to health care data, telehealth communications, etc. Privacy policies are typically short and unstructured HTML or PDF documents. We have created a framework to extract relevant knowledge from the health centers' policy documents and also represent these as a knowledge graph. Our framework helps to understand the extent to which individual provider policies comply with regulations and define access control policies that enforce the regulation rules on data in the knowledge graph extracted from COVID-related papers. Along with being compliant, privacy policies must also be transparent and easily understood by the clients. We analyze the relative readability of healthcare privacy policies and discuss the impact. In this paper, we develop a framework for access control decisions that uses policy compliance information to securely retrieve COVID data. We show how policy compliance information can be used to restrict access to COVID-19 data and information extracted from research papers.
引用
收藏
页数:13
相关论文
共 32 条
[1]   Acute Heart Failure in Multisystem Inflammatory Syndrome in Children in the Context of Global SARS-CoV-2 Pandemic [J].
Belhadjer, Zahra ;
Meot, Mathilde ;
Bajolle, Fanny ;
Khraiche, Diala ;
Legendre, Antoine ;
Abakka, Samya ;
Auriau, Johanne ;
Grimaud, Marion ;
Oualha, Mehdi ;
Beghetti, Maurice ;
Wacker, Julie ;
Ovaert, Caroline ;
Hascoet, Sebastien ;
Selegny, Maelle ;
Malekzadeh-Milani, Sophie ;
Maltret, Alice ;
Bosser, Gilles ;
Giroux, Nathan ;
Bonnemains, Laurent ;
Bordet, Jeanne ;
Di Filippo, Sylvie ;
Mauran, Pierre ;
Falcon-Eicher, Sylvie ;
Thambo, Jean-Benoit ;
Lefort, Bruno ;
Moceri, Pamela ;
Houyel, Lucile ;
Renolleau, Sylvain ;
Bonnet, Damien .
CIRCULATION, 2020, 142 (05) :429-436
[2]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[3]  
Centers for Disease Control and Prevention (CDC), 2003, MMWR Suppl, V52, P1
[4]   Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019-United States, February 12-March 28, 2020 [J].
Chow, Nancy ;
Fleming-Dutra, Katherine ;
Gierke, Ryan ;
Hall, Aron ;
Hughes, Michelle ;
Pilishvili, Tamara ;
Ritchey, Matthew ;
Roguski, Katherine ;
Skoff, Tami ;
Ussery, Emily .
MMWR-MORBIDITY AND MORTALITY WEEKLY REPORT, 2020, 69 (13) :382-386
[5]   A Comparative Study of Deep Learning based Named Entity Recognition Algorithms for Cybersecurity [J].
Dasgupta, Soham ;
Piplai, Aritran ;
Kotal, Anantaa ;
Joshi, Anupam .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :2596-2604
[6]  
Dozier C, 2010, LECT NOTES ARTIF INT, V6036, P27, DOI 10.1007/978-3-642-12837-0_2
[7]   Measuring Semantic Similarity across EU GDPR Regulation and Cloud Privacy Policies [J].
Elluri, Lavanya ;
Joshi, Karuna Pande ;
Kotal, Anantaa .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :3969-3978
[8]  
Elluri L, 2018, IEEE INT CONF BIG DA, P1266, DOI 10.1109/BigData.2018.8622236
[9]  
He Ying, 2008, AMIA Annu Symp Proc, P293
[10]  
HHS, 2020, HIPP COV, DOI 10.000/55555