The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data

被引：57

作者：

Bauder, Richard A. ^{[1
]}

Khoshgoftaar, Taghi M. ^{[1
]}

机构：

[1] Florida Atlantic Univ, Coll Engn & Comp Sci, Boca Raton, FL 33431 USA

来源：

HEALTH INFORMATION SCIENCE AND SYSTEMS | 2018年 / 6卷

关键词：

Medicare fraud; Class imbalance; Random undersampling; Big data;

D O I：

10.1007/s13755-018-0051-3

中图分类号：

R-058 [];

学科分类号：

摘要：

Healthcare in the United States is a critical aspect of most people's lives, particularly for the aging demographic. This rising elderly population continues to demand more cost-effective healthcare programs. Medicare is a vital program serving the needs of the elderly in the United States. The growing number of Medicare beneficiaries, along with the enormous volume of money in the healthcare industry, increases the appeal for, and risk of, fraud. In this paper, we focus on the detection of Medicare Part B provider fraud which involves fraudulent activities, such as patient abuse or neglect and billing for services not rendered, perpetrated by providers and other entities who have been excluded from participating in Federal healthcare programs. We discuss Part B data processing and describe a unique process for mapping fraud labels with known fraudulent providers. The labeled big dataset is highly imbalanced with a very limited number of fraud instances. In order to combat this class imbalance, we generate seven class distributions and assess the behavior and fraud detection performance of six different machine learning methods. Our results show that RF100 using a 90: 10 class distribution is the best learner with a 0.87302 AUC. Moreover, learner behavior with the 50: 50 balanced class distribution is similar to more imbalanced distributions which keep more of the original data. Based on the performance and significance testing results, we posit that retaining more of the majority class information leads to better Medicare Part B fraud detection performance over the balanced datasets across the majority of learners.

引用

页数：14

共 55 条

[1]

[Anonymous], 2015, FACTS RISING HLTH CA

[2]

[Anonymous], 2013, GROWTH ELDERLY POPUL

[3]

Arellano P., 2017, MAKING DECISIONS DAT

[4] A survey on the state of healthcare upcoding fraud analysis and detection [J].

Bauder R. ;

Khoshgoftaar T.M. ;

Seliya N. .

Health Services and Outcomes Research Methodology, 2017, 17 (1) :31-55

[5] A Survey of Medicare Data Processing and Integration for Fraud Detection [J].

Bauder, Richard A. ;

Khoshgoftaar, Taghi M. .

2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, :9-14

[6] Multivariate outlier detection in medicare claims payments applying probabilistic programming methods [J].

Bauder R.A. ;

Khoshgoftaar T.M. .

Health Services and Outcomes Research Methodology, 2017, 17 (3-4) :256-289

[7]

Bauder RA, 2016, PROC INT C TOOLS ART, P784, DOI [10.1109/ICTAI.2016.120, 10.1109/ICTAI.2016.0123]

[8] A Novel Method for Fraudulent Medicare Claims Detection from Expected Payment Deviations [J].

Bauder, Richard A. ;

Khoshgoftaar, Taghi M. .

PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, :11-19

[9]

Bekkar M., 2013, J INFORM ENG APPL, V3, P27, DOI DOI 10.5121/IJDKP.2013.3402

[10]

Branting LK, 2016, PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, P845, DOI 10.1109/ASONAM.2016.7752336

← 1 2 3 4 5 6 →