Smoking Status Classification: A Comparative Analysis of Machine Learning Techniques with Clinical Real World Data

被引:1
作者
Kugic, Amila [1 ]
Abdulnazar, Akhila [1 ,2 ]
Knezovic, Anto [1 ]
Schulz, Stefan [1 ]
Kreuzthaler, Markus [1 ]
机构
[1] Med Univ Graz, Inst Med Informat Stat & Documentat, Graz, Austria
[2] CBmed GmbH Ctr Biomarker Res Med, Graz, Austria
来源
ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024 | 2024年 / 14844卷
关键词
Natural Language Processing; Electronic Health Records; Machine Learning;
D O I
10.1007/978-3-031-66538-7_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic health records often lack consistent and organized documentation regarding lifestyle-related risk factors. This study addresses this by presenting methodologies aimed at standardizing the recording of patients' smoking status. Different types of machine learning methods are applied to an anonymized set of German-language clinical narratives in order to categorize smoking status as a multi-class classification task utilizing SNOMED CT as a terminology standard. Our findings demonstrate the effectiveness of downstreaming medBERT.de, an openly available medical language model in German, achieving the best performance with an F1-measure of [0.969-0.976] 95% CI, in comparison to CNN, LSTM and an SVM baseline.
引用
收藏
页码:182 / 191
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 2018, European health report 2018: More than numbers, evidence for all
[2]   Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing [J].
Bae, Ye Seul ;
Kim, Kyung Hwan ;
Kim, Han Kyul ;
Choi, Sae Won ;
Ko, Taehoon ;
Seo, Hee Hwa ;
Lee, Hae-Young ;
Jeon, Hyojin .
APPLIED SCIENCES-BASEL, 2021, 11 (19)
[3]   medBERT.de: A comprehensive German BERT model for the medical domain [J].
Bressem, Keno K. ;
Papaioannou, Jens-Michalis ;
Grundmann, Paul ;
Borchert, Florian ;
Adams, Lisa C. ;
Liu, Leonhard ;
Busch, Felix ;
Xu, Lina ;
Loyen, Jan P. ;
Niehues, Stefan M. ;
Augustin, Moritz ;
Grosser, Lennart ;
Makowski, Marcus R. ;
Aerts, Hugo J. W. L. ;
Loeser, Alexander .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[4]   Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records [J].
Caccamisi, Andrea ;
Jorgensen, Leif ;
Dalianis, Hercules ;
Rosenlund, Mats .
UPSALA JOURNAL OF MEDICAL SCIENCES, 2020, 125 (04) :316-324
[5]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[6]   Social Needs and Social Determinants: The Role of the Centers for Disease Control and Prevention and Public Health [J].
Hacker, Karen ;
Houry, Debra .
PUBLIC HEALTH REPORTS, 2022, 137 (06) :1049-1052
[7]   The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis [J].
Haque, Md Ashiqul ;
Gedara, Muditha Lakmali Bodawatte ;
Nickel, Nathan ;
Turgeon, Maxime ;
Lix, Lisa M. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
[8]  
Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
[9]   MIMIC-III, a freely accessible critical care database [J].
Johnson, Alistair E. W. ;
Pollard, Tom J. ;
Shen, Lu ;
Lehman, Li-wei H. ;
Feng, Mengling ;
Ghassemi, Mohammad ;
Moody, Benjamin ;
Szolovits, Peter ;
Celi, Leo Anthony ;
Mark, Roger G. .
SCIENTIFIC DATA, 2016, 3
[10]   Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit [J].
Karlsson, A. ;
Ellonen, A. ;
Irjala, H. ;
Valiaho, V. ;
Mattila, K. ;
Nissi, L. ;
Kyto, E. ;
Kurki, S. ;
Ristamaki, R. ;
Vihinen, P. ;
Laitinen, T. ;
Algars, A. ;
Jyrkkio, S. ;
Minn, H. ;
Heerva, E. .
ESMO OPEN, 2021, 6 (03)