Generation and evaluation of artificial mental health records for Natural Language Processing

被引:0
|
作者
Julia Ive
Natalia Viani
Joyce Kam
Lucia Yin
Somain Verma
Stephen Puntis
Rudolf N. Cardinal
Angus Roberts
Robert Stewart
Sumithra Velupillai
机构
[1] Imperial College London,Department of Computing
[2] King’s College London,IoPPN
[3] University of Oxford,Department of Psychiatry
[4] Warneford Hospital,Department of Psychiatry
[5] University of Cambridge,Cambridge Biomedical Campus
[6] Cambridgeshire and Peterborough NHS Foundation Trust,undefined
[7] South London and Maudsley NHS Foundation Trust,undefined
来源
npj Digital Medicine | / 3卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.
引用
收藏
相关论文
共 50 条
  • [31] Development of a natural language processing algorithm to detect chronic cough in electronic health records
    Bali, Vishal
    Weaver, Jessica
    Turzhitsky, Vladimir
    Schelfhout, Jonathan
    Paudel, Misti L.
    Hulbert, Erin
    Peterson-Brandt, Jesse
    Currie, Anne-Marie Guerra
    Bakka, Dylan
    BMC PULMONARY MEDICINE, 2022, 22 (01)
  • [32] Cohort design and natural language processing to reduce bias in electronic health records research
    Khurshid, Shaan
    Reeder, Christopher
    Harrington, Lia X.
    Singh, Pulkit
    Sarma, Gopal
    Friedman, Samuel F.
    Di Achille, Paolo
    Diamant, Nathaniel
    Cunningham, Jonathan W.
    Turner, Ashby C.
    Lau, Emily S.
    Haimovich, Julian S.
    Al-Alusi, Mostafa A.
    Wang, Xin
    Klarqvist, Marcus D. R.
    Ashburner, Jeffrey M.
    Diedrich, Christian
    Ghadessi, Mercedeh
    Mielke, Johanna
    Eilken, Hanna M.
    McElhinney, Alice
    Derix, Andrea
    Atlas, Steven J.
    Ellinor, Patrick T.
    Philippakis, Anthony A.
    Anderson, Christopher D.
    Ho, Jennifer E.
    Batra, Puneet
    Lubitz, Steven A.
    NPJ DIGITAL MEDICINE, 2022, 5 (01)
  • [33] NATURAL LANGUAGE PROCESSING METHODS ENHANCE MACE IDENTIFICATION FROM ELECTRONIC HEALTH RECORDS
    St Laurent, S.
    Guo, M.
    Alfonso, R.
    Okoro, T.
    Johansen, K.
    Dember, L.
    Lindsay, A.
    VALUE IN HEALTH, 2018, 21 : S217 - S217
  • [34] Natural language processing for electronic health records in anaesthesiology: an introduction to clinicians with recommendations and pitfalls
    Martin Bernstorff
    Simon Tilma Vistisen
    Kenneth C. Enevoldsen
    Journal of Clinical Monitoring and Computing, 2024, 38 : 241 - 245
  • [35] Development of a natural language processing algorithm to detect chronic cough in electronic health records
    Vishal Bali
    Jessica Weaver
    Vladimir Turzhitsky
    Jonathan Schelfhout
    Misti L. Paudel
    Erin Hulbert
    Jesse Peterson-Brandt
    Anne-Marie Guerra Currie
    Dylan Bakka
    BMC Pulmonary Medicine, 22
  • [36] Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records
    Jeffrey Thompson
    Jinxiang Hu
    Dinesh Pal Mudaranthakam
    David Streeter
    Lisa Neums
    Michele Park
    Devin C. Koestler
    Byron Gajewski
    Roy Jensen
    Matthew S. Mayo
    Scientific Reports, 9
  • [37] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
    Fu, Sunyang
    Lopes, Guilherme S.
    Pagali, Sandeep R.
    Thorsteinsdottir, Bjoerg
    LeBrasseur, Nathan K.
    Wen, Andrew
    Liu, Hongfang
    Rocca, Walter A.
    Olson, Janet E.
    St Sauver, Jennifer
    Sohn, Sunghwan
    JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
  • [38] Colonoscopy quality, quality measures, and a natural language processing tool for electronic health records
    Deutsch, John C.
    GASTROINTESTINAL ENDOSCOPY, 2012, 75 (06) : 1240 - 1242
  • [39] Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records
    Thompson, Jeffrey
    Hu, Jinxiang
    Mudaranthakam, Dinesh Pal
    Streeter, David
    Neums, Lisa
    Park, Michele
    Koestler, Devin C.
    Gajewski, Byron
    Jensen, Roy
    Mayo, Matthew S.
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [40] Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis
    Hutto, Alissa
    Zikry, Tarek M.
    Bohac, Buck
    Rose, Terra
    Staebler, Jasmine
    Slay, Janet
    Cheever, C. Ray
    Kosorok, Michael R.
    Nash, Rebekah P.
    HEALTH INFORMATICS JOURNAL, 2024, 30 (04)