A deep learning approach for transgender and gender diverse patient identification in electronic health records

被引：2

作者：

Hua, Yining ^{[1
,2
,3
,4
,9
]}

Wang, Liqin ^{[1
,2
]}

Nguyen, Vi ^{[1
,2
]}

Rieu-Werden, Meghan ^{[5
]}

McDowell, Alex ^{[6
,7
]}

Bates, David W. ^{[1
,2
]}

Foer, Dinah ^{[1
,2
,8
]}

Zhou, Li ^{[1
,2
]}

机构：

[1] Brigham & Womens Hosp, Dept Med, Div Gen Internal Med & Primary Care, Boston, MA 02145 USA

[2] Harvard Med Sch, Boston, MA USA

[3] Harvard TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA USA

[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA

[5] Massachusetts Gen Hosp, Div Gen Med, Boston, MA USA

[6] Massachusetts Gen Hosp, Mongan Inst, Hlth Policy Res Inst, Boston, MA USA

[7] Harvard Med Sch, Dept Hlth Care Policy, Boston, MA USA

[8] Brigham & Womens Hosp, Dept Med, Div Allergy & Clin Immunol, Boston, MA 02145 USA

[9] Brigham & Womens Hosp, Div Gen Internal Med & Primary Care, 399 Revolut Dr,Suite 777, Somerville, MA 02145 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2023年 / 147卷

关键词：

Gender identity; Transgender persons; Sexual and gender minorities; Electronic health records; Machine learning; Natural language processing;

D O I：

10.1016/j.jbi.2023.104507

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Background: Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it re-mains a challenging task due to incomplete gender information in structured EHR fields.Objective: Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work.Methods: This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/ 1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms.Results: The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms.Conclusion: This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.

引用

页数：10

共 35 条

[1] Alsentzer E., 2019, P 2 CLIN NAT LANG PR, P72, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/W19-1909]
[2] Beltran TG., 2023, TECHRXIV, DOI [10.22541/au.167886006.60405995/v1, DOI 10.22541/AU.167886006.60405995/V1]
[3] Berger A, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P222, DOI 10.1145/312624.312681
[4] Using clinician text notes in electronic medical record data to validate transgender-related diagnosis codes
Blosnich, John R.
Cashy, John
Gordon, Adam J.
Shipherd, Jillian C.
Kauth, Michael R.
Brown, George R.
Fine, Michael J.
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (07) : 905 - 908
[5] Transgender Demographics: A Household Probability Sample of US Adults, 2014
Crissman, Halley P.
Berger, Mitchell B.
Graham, Louis F.
Dalton, Vanessa K.
[J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2017, 107 (02) : 213 - 215
[6] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[7] A guide to deep learning in healthcare
Esteva, Andre
Robicquet, Alexandre
Ramsundar, Bharath
Kuleshov, Volodymyr
DePristo, Mark
Chou, Katherine
Cui, Claire
Corrado, Greg
Thrun, Sebastian
Dean, Jeff
[J]. NATURE MEDICINE, 2019, 25 (01) : 24 - 29
[8] Deep learning for healthcare applications based on physiological signals: A review
Faust, Oliver
Hagiwara, Yuki
Hong, Tan Jen
Lih, Oh Shu
Acharya, U. Rajendra
[J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 161 : 1 - 13
[9] Challenges with Accuracy of Gender Fields in Identifying Transgender Patients in Electronic Health Records
Foer, Dinah
Rubins, David M.
Almazan, Anthony
Chan, Kit
Bates, David W.
Hamnvik, Ole-Petter R.
[J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (12) : 3724 - 3725
[10] Guo Yi, 2020, AMIA Annu Symp Proc, V2020, P514

← 1 2 3 4 →