Big Data fraud detection using multiple medicare data sources

被引:84
|
作者
Herland, Matthew [1 ]
Khoshgoftaar, Taghi M. [1 ]
Bauder, Richard A. [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
关键词
Big Data; U; S; Medicare; LEIE; Fraud detection;
D O I
10.1186/s40537-018-0138-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the United States, advances in technology and medical sciences continue to improve the general well-being of the population. With this continued progress, programs such as Medicare are needed to help manage the high costs associated with quality healthcare. Unfortunately, there are individuals who commit fraud for nefarious reasons and personal gain, limiting Medicare's ability to effectively provide for the healthcare needs of the elderly and other qualifying people. To minimize fraudulent activities, the Centers for Medicare and Medicaid Services (CMS) released a number of "Big Data" datasets for different parts of the Medicare program. In this paper, we focus on the detection of Medicare fraud using the following CMS datasets: (1) Medicare Provider Utilization and Payment Data: Physician and Other Supplier (Part B), (2) Medicare Provider Utilization and Payment Data: Part D Prescriber (Part D), and (3) Medicare Provider Utilization and Payment Data: Referring Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS). Additionally, we create a fourth dataset which is a combination of the three primary datasets. We discuss data processing for all four datasets and the mapping of real-world provider fraud labels using the List of Excluded Individuals and Entities (LEIE) from the Office of the Inspector General. Our exploratory analysis on Medicare fraud detection involves building and assessing three learners on each dataset. Based on the Area under the Receiver Operating Characteristic (ROC) Curve performance metric, our results show that the Combined dataset with the Logistic Regression (LR) learner yielded the best overall score at 0.816, closely followed by the Part B dataset with LR at 0.805. Overall, the Combined and Part B datasets produced the best fraud detection performance with no statistical difference between these datasets, over all the learners. Therefore, based on our results and the assumption that there is no way to know within which part of Medicare a physician will commit fraud, we suggest using the Combined dataset for detecting fraudulent behavior when a physician has submitted payments through any or all Medicare parts evaluated in our study.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Hyperparameter Tuning for Medicare Fraud Detection in Big Data
    Hancock J.T.
    Khoshgoftaar T.M.
    SN Computer Science, 3 (6)
  • [2] Medicare Fraud Detection using Random Forest with Class Imbalanced Big Data
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 80 - 87
  • [3] Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    Hasanin, Tawfiq
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 137 - 142
  • [4] The Effects of Random Undersampling for Big Data Medicare Fraud Detection
    Hancock, John
    Khoshgoftaar, Taghi M.
    Johnson, Justin M.
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2022), 2022, : 141 - 146
  • [5] Improving Medicare Fraud Detection through Big Data Size Reduction Techniques
    Wang, Huanjing
    Hancock, John T., III
    Khoshgoftaar, Taghi M.
    2023 IEEE INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED SYSTEM ENGINEERING, SOSE, 2023, : 208 - 217
  • [6] A study on rare fraud predictions with big Medicare claims fraud data
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    INTELLIGENT DATA ANALYSIS, 2020, 24 (01) : 141 - 161
  • [7] The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2018, 6
  • [8] A Survey of Medicare Data Processing and Integration for Fraud Detection
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 9 - 14
  • [9] Telecom fraud detection with big data analytics
    Terzi, Duygu Sinanç
    Sağıroğlu, Şeref
    Kılınç, Hakan
    International Journal of Data Science, 2021, 6 (03) : 191 - 204
  • [10] Online Payment Fraud Detection for Big Data
    Tawde, Samiksha Dattaprasad
    Arora, Sandhya
    Thakur, Yashasvee Shitalkumar
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2024, 2024, 14501 : 324 - 337