A comparison of rule-based and machine learning approaches for classifying patient portal messages

被引：42

作者：

Cronin, Robert M. ^{[1
,2
,3
]}

Fabbri, Daniel ^{[1
,4
]}

Denny, Joshua C. ^{[1
,2
]}

Rosenbloom, S. Trent ^{[1
,2
,3
]}

Jackson, Gretchen Purcell ^{[1
,3
,5
]}

机构：

[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, 2525 West End Blvd,Suite 1475, Nashville, TN 37232 USA

[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN 37232 USA

[3] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37232 USA

[4] Vanderbilt Univ, Dept Comp Sci, Nashville, TN 37232 USA

[5] Vanderbilt Univ, Med Ctr, Dept Pediat Surg, Nashville, TN 37232 USA

来源：

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS | 2017年 / 105卷

关键词：

Patient portal; Text classification; Natural language processing; Machine learning; TEXT CLASSIFICATION; SECURE MESSAGES; PRIMARY-CARE; HEALTH-CARE; INTERVENTIONS; COMMUNICATION; INFORMATION; COMPUTER;

D O I：

10.1016/j.ijmedinf.2017.06.004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective: Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. Materials and methods: We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. Results: The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naive Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal Concept, which maps to 'morning', 'evening' and Idea or Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean'). Conclusions: This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.

引用

页码：110 / 120

页数：11

共 50 条

[21] Pareto Inspired Multi-objective Rule Fitness for Adaptive Rule-based Machine Learning
Urbanowicz, Ryan J.
Olson, Randal S.
Moore, Jason H.
PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'16 COMPANION), 2016, : 1403 - 1403
[22] A Rule-Based Parser in Comparison with Statistical Neuronal Approaches in Terms of Grammar Competence
Struebbe, Simon M.
Gruenwald, Alexander T. D.
Sidorenko, Irina
Lampe, Renee
APPLIED SCIENCES-BASEL, 2025, 15 (01):
[23] Rule-Based Framework for Detection of Smishing Messages in Mobile Environment
Jain, Ankit Kumar
Gupta, B. B.
6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 617 - 623
[24] A rule-based machine learning model for career selection through MBTI personality
Fatima, Noureen
Gul, Sana
Ahmed, Javed
Khand, Zahid Hussain
Mujtaba, Ghulam
MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2022, 41 (02) : 185 - 196
[25] Clinical Spasticity Assessment Assisted by Machine Learning Methods and Rule-Based Decision
Yee, Jingye
Low, Cheng Yee
Hashim, Natiara Mohamad
Zakaria, Noor Ayuni Che
Johar, Khairunnisa
Othman, Nurul Atiqah
Chieng, Hock Hung
Hanapiah, Fazah Akhtar
DIAGNOSTICS, 2023, 13 (04)
[26] Development of an Advanced Rule-Based Control Strategy for a PHEV Using Machine Learning
Son, Hanho
Kim, Hyunhwa
Hwang, Sungho
Kim, Hyunsoo
ENERGIES, 2018, 11 (01):
[27] An Intelligent System for Classifying Patient Complaints Using Machine Learning and Natural Language Processing: Development and Validation Study
Li, Xiadong
Shu, Qiang
Kong, Canhong
Wang, Jinhu
Li, Gang
Fang, Xin
Lou, Xiaomin
Yu, Gang
JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
[28] Comparing machine learning and rule-based inferencing for semantic enrichment of BIM models
Bloch, Tanya
Sacks, Rafael
AUTOMATION IN CONSTRUCTION, 2018, 91 : 256 - 272
[29] Information Extraction on Novel Text using Machine Learning and Rule-based System
Chaniago, Ria
Khodra, Masayu Leylia
2017 INTERNATIONAL CONFERENCE ON INNOVATIVE AND CREATIVE INFORMATION TECHNOLOGY (ICITECH), 2017,
[30] Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning
van Ginneken B.
Radiological Physics and Technology, 2017, 10 (1) : 23 - 32

← 1 2 3 4 5 →