A MULTIVARIATE FREQUENCY-SEVERITY FRAMEWORK FOR HEALTHCARE DATA BREACHES

被引:1
作者
Sun, Hong [1 ]
Xu, Maochao [2 ]
Zhao, Peng [3 ]
机构
[1] Lanzhou Univ, Sch Math & Stat, Lanzhou, Peoples R China
[2] Illinois State Univ, Dept Math, Normal, IL USA
[3] Jiangsu Normal Univ, Sch Math & Stat, Jiangsu Prov Key Lab Educ Big Data Sci & Engn, RIMS, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Copula; data breach; heavy tail; multivariate dependence; score; COUNT DATA; REGRESSION; MODELS; RULES; COSTS; RISK;
D O I
10.1214/22-AOAS1625
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Data breaches in healthcare have become a substantial concern in recent years and cause millions of dollars in financial losses each year. It is fundamental for government regulators, insurance companies, and stakeholders to understand the breach frequency and the number of affected individuals in each state, as these are directly related to the federal Health Insurance Portability and Accountability Act (HIPAA) and state data breach laws. However, an obstacle to studying data breaches in healthcare is the lack of suitable statistical approaches. We develop a novel multivariate frequency-severity framework to analyze breach frequency and the number of affected individuals at the state level. A mixed effects model is developed to model the square root transformed frequency, and the log-gamma distribution is proposed to capture the skewness and heavy tail exhibited by the distribution of numbers of affected individuals. We further discover a positive nonlinear dependence between the transformed frequency and the log-transformed numbers of affected individuals (i.e., severity). In particular, we propose to use a D-vine copula to capture the multivariate dependence among conditional severities, given frequencies due to its inherent temporal structure and rich bivariate copula families. The rejection sampling technique is developed to simulate the predictive distributions. Both the in-sample and out-of-sample studies show that the proposed multivariate frequency-severity model that accommodates nonlinear dependence has satisfactory fitting and prediction performances.
引用
收藏
页码:240 / 268
页数:29
相关论文
共 54 条
[1]  
[Anonymous], 2020, Financial Impact of Intellectual Property Cyber Assets: 2020 Aon-Ponemon Global Report
[2]  
Baltagi B.H., 2008, Econometric Analysis of Panel Data, V4th
[3]   PROVING LIMITS OF STATE DATA BREACH NOTIFICATION LAWS Is a Federal Law the Most Adequate Solution? [J].
Bisogni, Fabio .
JOURNAL OF INFORMATION POLICY, 2016, 6 :154-205
[4]  
Brechmann EC, 2013, J STAT SOFTW, V52
[5]   glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling [J].
Brooks, Mollie E. ;
Kristensen, Kasper ;
van Benthem, Koen J. ;
Magnusson, Arni ;
Berg, Casper W. ;
Nielsen, Anders ;
Skaug, Hans J. ;
Machler, Martin ;
Bolker, Benjamin M. .
R JOURNAL, 2017, 9 (02) :378-400
[6]  
BUCKMAN J., 2018, SSRN Electron. J., DOI [10.2139/ssrn.3258599, DOI 10.2139/SSRN.3258599]
[7]   ESTIMATING THE TOTAL CLAIMS DISTRIBUTION USING MULTIVARIATE FREQUENCY AND SEVERITY DISTRIBUTIONS [J].
CUMMINS, JD ;
WILTBANK, LJ .
JOURNAL OF RISK AND INSURANCE, 1983, 50 (03) :377-403
[8]  
Czado C., 2019, LECT NOTES STAT, DOI DOI 10.1007/978-3-030-13785-4
[9]  
De Haan L., 2006, SPRING S OPERAT RES
[10]  
Dobson A. J., 2002, An Introduction to Generalized Linear Models