Enhancing an enterprise data warehouse for research with data extracted using natural language processing

被引:0
作者
Magoc, Tanja [1 ]
Everson, Russell [2 ]
Harle, Christopher A. A. [3 ]
机构
[1] Univ Florida, Coll Med, Gainesville, FL 32611 USA
[2] UF Hlth, Gainesville, FL USA
[3] IUPUI, Richard M Fairbanks Sch Publ Hlth, Indianapolis, IN USA
关键词
Natural language processing; enterprise data warehouse for research; electronic health records; data service; smoking behavior; rule-based; ETL; CESSATION; SYSTEMS;
D O I
10.1017/cts.2023.575
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Objective:This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Materials and methods:Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Results:Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Discussion:Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Conclusion:Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.
引用
收藏
页数:8
相关论文
共 45 条
  • [1] Lung cancer screening
    Adams, Scott J.
    Stone, Emily
    Baldwin, David R.
    Vliegenthart, Rozemarijn
    Lee, Pyng
    Fintelmann, Florian J.
    [J]. LANCET, 2023, 401 (10374) : 390 - 408
  • [2] Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies
    Afshar, Majid
    Dligach, Dmitriy
    Sharma, Brihat
    Cai, Xiaoyuan
    Boyda, Jason
    Birch, Steven
    Valdez, Daniel
    Zelisko, Suzan
    Joyce, Cara
    Modave, Francois
    Price, Ron
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (11) : 1364 - 1369
  • [3] [Anonymous], SAP BUS OBJ BUS INT
  • [4] [Anonymous], 2022, Who should be screened for lung cancer?
  • [5] An overview of MetaMap: historical perspective and recent advances
    Aronson, Alan R.
    Lang, Francois-Michel
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) : 229 - 236
  • [6] Athena, US
  • [7] Understanding enterprise data warehouses to support clinical and translational research
    Campion, Thomas R., Jr.
    Craven, Catherine K.
    Dorr, David A.
    Knosp, Boyd M.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (09) : 1352 - 1358
  • [8] Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings
    Carrell, David S.
    Schoen, Robert E.
    Leffler, Daniel A.
    Morris, Michele
    Rose, Sherri
    Baer, Andrew
    Crockett, Seth D.
    Gourevitch, Rebecca A.
    Dean, Katie M.
    Mehrotra, Ateev
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (05) : 986 - 991
  • [9] Chen Elizabeth S, 2006, AMIA Annu Symp Proc, P126
  • [10] Variation in Physicians' Electronic Health Record Documentation and Potential Patient Harm from That Variation
    Cohen, Genna R.
    Friedman, Charles P.
    Ryan, Andrew M.
    Richardson, Caroline R.
    Adler-Milstein, Julia
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2019, 34 (11) : 2355 - 2367