Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records

被引:6
|
作者
Goodman-Meza, David [1 ,2 ]
Tang, Amber [3 ]
Aryanfar, Babak [2 ]
Vazquez, Sergio [4 ]
Gordon, Adam J. [5 ,6 ]
Goto, Michihiko [7 ,8 ]
Goetz, Matthew Bidwell [2 ,3 ]
Shoptaw, Steven [9 ]
Bui, Alex A. T. [10 ]
机构
[1] Univ Calif Los Angeles, David Geffen Sch Med, Div Infect Dis, Los Angeles, CA 90095 USA
[2] Vet Affairs Greater Los Angeles Healthcare Syst, Los Angeles, CA USA
[3] Univ Calif Los Angeles, David Geffen Sch Med, Dept Internal Med, Los Angeles, CA 90095 USA
[4] Dartmouth Coll, Undergrad Studies, Hanover, NH 03755 USA
[5] Vet Affairs Salt Lake City Hlth Care Syst, Informat Decis Enhancement & Analyt Sci Ctr, Salt Lake City, UT USA
[6] Univ Utah, Sch Med, Dept Internal Med, Div Epidemiol, Salt Lake City, UT USA
[7] Univ Iowa, Dept Internal Med, Iowa City, IA 52242 USA
[8] Iowa City Vet Affairs Med Ctr, Ctr Access & Delivery Res & Evaluat, Iowa City, IA USA
[9] Univ Calif Los Angeles, David Geffen Sch Med, Dept Family Med, Los Angeles, CA 90095 USA
[10] Univ Calif Los Angeles, Dept Radiol Sci, Med & Imaging Informat Grp, Los Angeles, CA 90095 USA
来源
OPEN FORUM INFECTIOUS DISEASES | 2022年 / 9卷 / 09期
关键词
EHR; machine learning; NLP; PWID; ENDOCARDITIS; ALGORITHM; DISEASES; RISE;
D O I
10.1093/ofid/ofac471
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Background. Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific International Classification of Diseases (ICD) codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific ICD codes for identifying PWID. Methods. We manually reviewed 1000 records of patients diagnosed with Staphylococcus aureus bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of ICD codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value. Results. Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all ICD-based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best ICD-based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%. Conclusions. NLP/ML outperformed ICD-based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicolas B.
    CIRCULATION, 2020, 141
  • [2] DEVELOPMENT OF A NATURAL LANGUAGE PROCESSING MACHINE TO IDENTIFY OPIOID USE DISORDER IN ELECTRONIC HEALTH RECORDS.
    Vorontsova, Y.
    Broyles, A.
    Cummins, J.
    Hood, D.
    Stratford, R.
    Quinney, S.
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2021, 109 : S60 - S60
  • [3] Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
    McDermott, Sean P.
    Wasan, Ajay D.
    JOURNAL OF PAIN RESEARCH, 2023, 16 : 2133 - 2140
  • [4] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Deng, Yu
    Pacheco, Jennifer A.
    Ghosh, Anika
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua C.
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Dorn, Chad
    Weng, Chunhua
    Liu, Cong
    Cordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 22 (SUPPL 2)
  • [5] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Yu Deng
    Jennifer A. Pacheco
    Anika Ghosh
    Anh Chung
    Chengsheng Mao
    Joshua C. Smith
    Juan Zhao
    Wei-Qi Wei
    April Barnado
    Chad Dorn
    Chunhua Weng
    Cong Liu
    Adam Cordon
    Jingzhi Yu
    Yacob Tedla
    Abel Kho
    Rosalind Ramsey-Goldman
    Theresa Walunas
    Yuan Luo
    BMC Medical Informatics and Decision Making, 22
  • [6] Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records
    Deng, Yu
    Pacheco, Jennifer
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Weng, Chunhua
    Liu, Cong
    Gordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 666 - 667
  • [7] Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendances
    Tohira, Hideo
    Finn, Judith
    Ball, Stephen
    Brink, Deon
    Buzzacott, Peter
    INFORMATICS FOR HEALTH & SOCIAL CARE, 2022, 47 (04): : 403 - 413
  • [8] Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records
    Guo, Yuting
    Shi, Haoming
    Book, Wendy M.
    Ivey, Lindsey Carrie
    Rodriguez, Fred H.
    Sameni, Reza
    Raskind-Hood, Cheryl
    Robichaux, Chad
    Downing, Karrie F.
    Sarker, Abeed
    BIRTH DEFECTS RESEARCH, 2025, 117 (03):
  • [9] Using Natural Language Processing to Identify Different Lens Pathology in Electronic Health Records
    Stein, Joshua d.
    Zhou, Yunshu
    Andrews, Chris a.
    Kim, Judy e.
    Addis, Victoria
    Bixler, Jill
    Grove, Nathan
    Mcmillan, Brian
    Munir, Saleha z.
    Pershing, Suzann
    Schultz, Jeffrey s.
    Stagg, Brian c.
    Wang, Sophia y.
    Woreta, Fasika
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2024, 262 : 153 - 160
  • [10] Prediction and evaluation of combination pharmacotherapy using natural language processing, machine learning and patient electronic health records
    Ding, Pingjian
    Pan, Yiheng
    Wang, Quanqiu
    Xu, Rong
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 133