A deep learning approach for transgender and gender diverse patient identification in electronic health records

被引:2
作者
Hua, Yining [1 ,2 ,3 ,4 ,9 ]
Wang, Liqin [1 ,2 ]
Nguyen, Vi [1 ,2 ]
Rieu-Werden, Meghan [5 ]
McDowell, Alex [6 ,7 ]
Bates, David W. [1 ,2 ]
Foer, Dinah [1 ,2 ,8 ]
Zhou, Li [1 ,2 ]
机构
[1] Brigham & Womens Hosp, Dept Med, Div Gen Internal Med & Primary Care, Boston, MA 02145 USA
[2] Harvard Med Sch, Boston, MA USA
[3] Harvard TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA USA
[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA
[5] Massachusetts Gen Hosp, Div Gen Med, Boston, MA USA
[6] Massachusetts Gen Hosp, Mongan Inst, Hlth Policy Res Inst, Boston, MA USA
[7] Harvard Med Sch, Dept Hlth Care Policy, Boston, MA USA
[8] Brigham & Womens Hosp, Dept Med, Div Allergy & Clin Immunol, Boston, MA 02145 USA
[9] Brigham & Womens Hosp, Div Gen Internal Med & Primary Care, 399 Revolut Dr,Suite 777, Somerville, MA 02145 USA
关键词
Gender identity; Transgender persons; Sexual and gender minorities; Electronic health records; Machine learning; Natural language processing;
D O I
10.1016/j.jbi.2023.104507
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it re-mains a challenging task due to incomplete gender information in structured EHR fields.Objective: Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work.Methods: This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/ 1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms.Results: The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms.Conclusion: This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.
引用
收藏
页数:10
相关论文
共 35 条
  • [1] Alsentzer E., 2019, P 2 CLIN NAT LANG PR, P72, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/W19-1909]
  • [2] Beltran TG., 2023, TECHRXIV, DOI [10.22541/au.167886006.60405995/v1, DOI 10.22541/AU.167886006.60405995/V1]
  • [3] Berger A, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P222, DOI 10.1145/312624.312681
  • [4] Using clinician text notes in electronic medical record data to validate transgender-related diagnosis codes
    Blosnich, John R.
    Cashy, John
    Gordon, Adam J.
    Shipherd, Jillian C.
    Kauth, Michael R.
    Brown, George R.
    Fine, Michael J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (07) : 905 - 908
  • [5] Transgender Demographics: A Household Probability Sample of US Adults, 2014
    Crissman, Halley P.
    Berger, Mitchell B.
    Graham, Louis F.
    Dalton, Vanessa K.
    [J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2017, 107 (02) : 213 - 215
  • [6] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [7] A guide to deep learning in healthcare
    Esteva, Andre
    Robicquet, Alexandre
    Ramsundar, Bharath
    Kuleshov, Volodymyr
    DePristo, Mark
    Chou, Katherine
    Cui, Claire
    Corrado, Greg
    Thrun, Sebastian
    Dean, Jeff
    [J]. NATURE MEDICINE, 2019, 25 (01) : 24 - 29
  • [8] Deep learning for healthcare applications based on physiological signals: A review
    Faust, Oliver
    Hagiwara, Yuki
    Hong, Tan Jen
    Lih, Oh Shu
    Acharya, U. Rajendra
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 161 : 1 - 13
  • [9] Challenges with Accuracy of Gender Fields in Identifying Transgender Patients in Electronic Health Records
    Foer, Dinah
    Rubins, David M.
    Almazan, Anthony
    Chan, Kit
    Bates, David W.
    Hamnvik, Ole-Petter R.
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (12) : 3724 - 3725
  • [10] Guo Yi, 2020, AMIA Annu Symp Proc, V2020, P514