Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing

被引：1

作者：

Scroggins, Jihye Kim ^{[1
]}

Hulchafo, Ismael I. ^{[1
]}

Harkins, Sarah ^{[1
]}

Scharp, Danielle ^{[2
]}

Moen, Hans ^{[3
]}

Davoudi, Anahita ^{[4
]}

Cato, Kenrick ^{[5
]}

Tadiello, Michele ^{[6
]}

Topaz, Maxim ^{[1
]}

Barcelona, Veronica ^{[1
]}

机构：

[1] Columbia Univ, Sch Nursing, 560 W 168th St, New York, NY 10032 USA

[2] Icahn Sch Med, Mt Sinai, NY 10029 USA

[3] Aalto Univ, Dept Comp Sci, Espoo 02150, Finland

[4] VNS Hlth, New York, NY 10017 USA

[5] Univ Penn, Sch Nursing, Philadelphia, PA 19104 USA

[6] Columbia Univ, Ctr Community Engaged Hlth Informat & Data Sci, Irving Med Ctr, New York, NY 10032 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 32卷 / 02期

关键词：

natural language processing; electronic health records; health communication; bias; nursing informatics; AGREEMENT;

D O I：

10.1093/jamia/ocae290

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective: To identify stigmatizing language in obstetric clinical notes using natural language processing (NLP). Materials and methods: We analyzed electronic health records from birth admissions in the Northeast United States in 2017. We annotated 1771 clinical notes to generate the initial gold standard dataset. Annotators labeled for exemplars of 5 stigmatizing and 1 positive/preferred language categories. We used a semantic similarity-based search approach to expand the initial dataset by adding additional exemplars, composing an enhanced dataset. We employed traditional classifiers (Support Vector Machine, Decision Trees, and Random Forest) and a transformer-based model, ClinicalBERT (Bidirectional Encoder Representations from Transformers) and BERT base. Models were trained and validated on initial and enhanced datasets and were tested on enhanced testing dataset. Results: In the initial dataset, we annotated 963 exemplars as stigmatizing or positive/preferred. The most frequently identified category was marginalized language/identities (n = 397, 41%), and the least frequent was questioning patient credibility (n = 51, 5%). After employing a semantic similarity-based search approach, 502 additional exemplars were added, increasing the number of low-frequency categories. All NLP models also showed improved performance, with Decision Trees demonstrating the greatest improvement (21%). ClinicalBERT outperformed other models, with the highest average F1-score of 0.78. Discussion: Clinical BERT seems to most effectively capture the nuanced and context-dependent stigmatizing language found in obstetric clinical notes, demonstrating its potential clinical applications for real-time monitoring and alerts to prevent usages of stigmatizing language use and reduce healthcare bias. Future research should explore stigmatizing language in diverse geographic locations and clinical settings to further contribute to high-quality and equitable perinatal care. Conclusion: ClinicalBERT effectively captures the nuanced stigmatizing language in obstetric clinical notes. Our semantic similarity-based search approach to rapidly extract additional exemplars enhanced the performances while reducing the need for labor-intensive annotation.

引用

页码：308 / 317

页数：10

共 41 条

[1] An information-theoretic perspective of tf-idf measures [J].

Aizawa, A .

INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :45-65

[2] Optuna: A Next-generation Hyperparameter Optimization Framework [J].

Akiba, Takuya ;

Sano, Shotaro ;

Yanase, Toshihiko ;

Ohta, Takeru ;

Koyama, Masanori .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631

[3]

Alsentzer Emily, 2019, P 2 CLIN NAT LANG PR, V2, P72, DOI DOI 10.18653/V1/W19-1909

[4] Inter-Coder Agreement for Computational Linguistics [J].

Artstein, Ron ;

Poesio, Massimo .

COMPUTATIONAL LINGUISTICS, 2008, 34 (04) :555-596

[5] Identifying stigmatizing language in clinical documentation: A scoping review of emerging literature [J].

Barcelona, Veronica ;

Scharp, Danielle ;

Idnay, Betina R. ;

Moen, Hans ;

Cato, Kenrick ;

Topaz, Maxim .

PLOS ONE, 2024, 19 (06)

[6] Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes [J].

Barcelona, Veronica ;

Scharp, Danielle ;

Moen, Hans ;

Davoudi, Anahita ;

Idnay, Betina R. ;

Cato, Kenrick ;

Topaz, Maxim .

MATERNAL AND CHILD HEALTH JOURNAL, 2023, 28 (3) :578-586

[7] A qualitative analysis of stigmatizing language in birth admission clinical notes [J].

Barcelona, Veronica ;

Scharp, Danielle ;

Idnay, Betina R. R. ;

Moen, Hans ;

Goffman, Dena ;

Cato, Kenrick ;

Topaz, Maxim .

NURSING INQUIRY, 2023, 30 (03)

[8] Ubiquitous Yet Unclear: A Systematic Review of Medical Mistrust [J].

Benkert, Ramona ;

Cuevas, Adolfo ;

Thompson, Hayley S. ;

Dove-Meadows, Emily ;

Knuckles, Donulae .

BEHAVIORAL MEDICINE, 2019, 45 (02) :86-101

[9]

Bilotta I, 2024, JMIR MED INF, V12, DOI [10.2196/50428, 10.2024/1/e50428]

[10] Nonparametric Bootstrap Confidence Intervals for Variance Components Applied to Interlaboratory Comparisons [J].

Burch, Brent D. .

JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2012, 17 (02) :228-245

← 1 2 3 4 5 →