Evaluating Machine Learning Stability in Predicting Depression and Anxiety Amidst Subjective Response Errors

被引：7

作者：

Ku, Wai Lim ^{[1
]}

Min, Hua ^{[2
]}

机构：

[1] NHLBI, Syst Biol Ctr, NIH, Bethesda, MD 20892 USA

[2] George Mason Univ, Coll Publ Hlth, Dept Hlth Adm & Policy, Fairfax, VA 22030 USA

来源：

HEALTHCARE | 2024年 / 12卷 / 06期

关键词：

mental health prediction; machine learning; stability; electronic health records; data perturbation; algorithmic bias; survey data analysis; BIAS; ASSOCIATION;

D O I：

10.3390/healthcare12060625

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD) pose significant burdens on individuals and society, necessitating accurate prediction methods. Machine learning (ML) algorithms utilizing electronic health records and survey data offer promising tools for forecasting these conditions. However, potential bias and inaccuracies inherent in subjective survey responses can undermine the precision of such predictions. This research investigates the reliability of five prominent ML algorithms-a Convolutional Neural Network (CNN), Random Forest, XGBoost, Logistic Regression, and Naive Bayes-in predicting MDD and GAD. A dataset rich in biomedical, demographic, and self-reported survey information is used to assess the algorithms' performance under different levels of subjective response inaccuracies. These inaccuracies simulate scenarios with potential memory recall bias and subjective interpretations. While all algorithms demonstrate commendable accuracy with high-quality survey data, their performance diverges significantly when encountering erroneous or biased responses. Notably, the CNN exhibits superior resilience in this context, maintaining performance and even achieving enhanced accuracy, Cohen's kappa score, and positive precision for both MDD and GAD. This highlights the CNN's superior ability to handle data unreliability, making it a potentially advantageous choice for predicting mental health conditions based on self-reported data. These findings underscore the critical importance of algorithmic resilience in mental health prediction, particularly when relying on subjective data. They emphasize the need for careful algorithm selection in such contexts, with the CNN emerging as a promising candidate due to its robustness and improved performance under data uncertainties.

引用

页数：32

共 55 条

[31]

Niu X, 2020, Arxiv, DOI arXiv:2005.00580

[32]

O'Shea K, 2015, Arxiv, DOI [arXiv:1511.08458, DOI 10.48550/ARXIV.1511.08458]

[33] Mitigating underreported error in food frequency questionnaire data using a supervised machine learning method and error adjustment algorithm [J].

Popoola, Anjolaoluwa Ayomide ;

Frediani, Jennifer Koren ;

Hartman, Terryl Johnson ;

Paynabar, Kamran .

BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)

[34] Predicting Anxiety, Depression and Stress in Modem Life using Machine Learning Algorithms [J].

Priya, Anu ;

Garg, Shruti ;

Tigga, Neha Prerna .

INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 :1258-1267

[35]

Radwan A, 2024, Arxiv, DOI arXiv:2402.00910

[36]

Ram Kumar R. P., 2020, Proceedings of the Third International Conference on Computational Intelligence and Informatics. ICCII 2018. Advances in Intelligent Systems and Computing (AISC 1090), P683, DOI 10.1007/978-981-15-1480-7_59

[37] A machine learning analysis of COVID-19 mental health data [J].