Generation and evaluation of artificial mental health records for Natural Language Processing

被引：0

作者：

Julia Ive

Natalia Viani

Joyce Kam

Lucia Yin

Somain Verma

Stephen Puntis

Rudolf N. Cardinal

Angus Roberts

Robert Stewart

Sumithra Velupillai

机构：

[1] Imperial College London,Department of Computing

[2] King’s College London,IoPPN

[3] University of Oxford,Department of Psychiatry

[4] Warneford Hospital,Department of Psychiatry

[5] University of Cambridge,Cambridge Biomedical Campus

[6] Cambridgeshire and Peterborough NHS Foundation Trust,undefined

[7] South London and Maudsley NHS Foundation Trust,undefined

来源：

npj Digital Medicine | / 3卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

引用

共 50 条

[31] Development of a natural language processing algorithm to detect chronic cough in electronic health records
Bali, Vishal
Weaver, Jessica
Turzhitsky, Vladimir
Schelfhout, Jonathan
Paudel, Misti L.
Hulbert, Erin
Peterson-Brandt, Jesse
Currie, Anne-Marie Guerra
Bakka, Dylan
BMC PULMONARY MEDICINE, 2022, 22 (01)
[32] Cohort design and natural language processing to reduce bias in electronic health records research
Khurshid, Shaan
Reeder, Christopher
Harrington, Lia X.
Singh, Pulkit
Sarma, Gopal
Friedman, Samuel F.
Di Achille, Paolo
Diamant, Nathaniel
Cunningham, Jonathan W.
Turner, Ashby C.
Lau, Emily S.
Haimovich, Julian S.
Al-Alusi, Mostafa A.
Wang, Xin
Klarqvist, Marcus D. R.
Ashburner, Jeffrey M.
Diedrich, Christian
Ghadessi, Mercedeh
Mielke, Johanna
Eilken, Hanna M.
McElhinney, Alice
Derix, Andrea
Atlas, Steven J.
Ellinor, Patrick T.
Philippakis, Anthony A.
Anderson, Christopher D.
Ho, Jennifer E.
Batra, Puneet
Lubitz, Steven A.
NPJ DIGITAL MEDICINE, 2022, 5 (01)
[33] NATURAL LANGUAGE PROCESSING METHODS ENHANCE MACE IDENTIFICATION FROM ELECTRONIC HEALTH RECORDS
St Laurent, S.
Guo, M.
Alfonso, R.
Okoro, T.
Johansen, K.
Dember, L.
Lindsay, A.
VALUE IN HEALTH, 2018, 21 : S217 - S217
[34] Natural language processing for electronic health records in anaesthesiology: an introduction to clinicians with recommendations and pitfalls
Martin Bernstorff
Simon Tilma Vistisen
Kenneth C. Enevoldsen
Journal of Clinical Monitoring and Computing, 2024, 38 : 241 - 245
[35] Development of a natural language processing algorithm to detect chronic cough in electronic health records
Vishal Bali
Jessica Weaver
Vladimir Turzhitsky
Jonathan Schelfhout
Misti L. Paudel
Erin Hulbert
Jesse Peterson-Brandt
Anne-Marie Guerra Currie
Dylan Bakka
BMC Pulmonary Medicine, 22
[36] Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records
Jeffrey Thompson
Jinxiang Hu
Dinesh Pal Mudaranthakam
David Streeter
Lisa Neums
Michele Park
Devin C. Koestler
Byron Gajewski
Roy Jensen
Matthew S. Mayo
Scientific Reports, 9
[37] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
Fu, Sunyang
Lopes, Guilherme S.
Pagali, Sandeep R.
Thorsteinsdottir, Bjoerg
LeBrasseur, Nathan K.
Wen, Andrew
Liu, Hongfang
Rocca, Walter A.
Olson, Janet E.
St Sauver, Jennifer
Sohn, Sunghwan
JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
[38] Colonoscopy quality, quality measures, and a natural language processing tool for electronic health records
Deutsch, John C.
GASTROINTESTINAL ENDOSCOPY, 2012, 75 (06) : 1240 - 1242
[39] Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records
Thompson, Jeffrey
Hu, Jinxiang
Mudaranthakam, Dinesh Pal
Streeter, David
Neums, Lisa
Park, Michele
Koestler, Devin C.
Gajewski, Byron
Jensen, Roy
Mayo, Matthew S.
SCIENTIFIC REPORTS, 2019, 9 (1)
[40] Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis
Hutto, Alissa
Zikry, Tarek M.
Bohac, Buck
Rose, Terra
Staebler, Jasmine
Slay, Janet
Cheever, C. Ray
Kosorok, Michael R.
Nash, Rebekah P.
HEALTH INFORMATICS JOURNAL, 2024, 30 (04)

← 1 2 3 4 5 →