A MULTIVARIATE FREQUENCY-SEVERITY FRAMEWORK FOR HEALTHCARE DATA BREACHES

被引:1
作者
Sun, Hong [1 ]
Xu, Maochao [2 ]
Zhao, Peng [3 ]
机构
[1] Lanzhou Univ, Sch Math & Stat, Lanzhou, Peoples R China
[2] Illinois State Univ, Dept Math, Normal, IL USA
[3] Jiangsu Normal Univ, Sch Math & Stat, Jiangsu Prov Key Lab Educ Big Data Sci & Engn, RIMS, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Copula; data breach; heavy tail; multivariate dependence; score; COUNT DATA; REGRESSION; MODELS; RULES; COSTS; RISK;
D O I
10.1214/22-AOAS1625
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Data breaches in healthcare have become a substantial concern in recent years and cause millions of dollars in financial losses each year. It is fundamental for government regulators, insurance companies, and stakeholders to understand the breach frequency and the number of affected individuals in each state, as these are directly related to the federal Health Insurance Portability and Accountability Act (HIPAA) and state data breach laws. However, an obstacle to studying data breaches in healthcare is the lack of suitable statistical approaches. We develop a novel multivariate frequency-severity framework to analyze breach frequency and the number of affected individuals at the state level. A mixed effects model is developed to model the square root transformed frequency, and the log-gamma distribution is proposed to capture the skewness and heavy tail exhibited by the distribution of numbers of affected individuals. We further discover a positive nonlinear dependence between the transformed frequency and the log-transformed numbers of affected individuals (i.e., severity). In particular, we propose to use a D-vine copula to capture the multivariate dependence among conditional severities, given frequencies due to its inherent temporal structure and rich bivariate copula families. The rejection sampling technique is developed to simulate the predictive distributions. Both the in-sample and out-of-sample studies show that the proposed multivariate frequency-severity model that accommodates nonlinear dependence has satisfactory fitting and prediction performances.
引用
收藏
页码:240 / 268
页数:29
相关论文
共 54 条
[11]   Hype and heavy tails: A closer look at data breaches [J].
Edwards, Benjamin ;
Hofmeyr, Steven ;
Forrest, Stephanie .
JOURNAL OF CYBERSECURITY, 2016, 2 (01) :3-14
[12]   Cyber risk research in business and actuarial science [J].
Eling, Martin .
EUROPEAN ACTUARIAL JOURNAL, 2020, 10 (02) :303-333
[13]   What are the actual costs of cyber risk events? [J].
Eling, Martin ;
Wirfs, Jan .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2019, 272 (03) :1109-1119
[14]   Copula approaches for modeling cross-sectional dependence of data breach losses [J].
Eling, Martin ;
Jung, Kwangmin .
INSURANCE MATHEMATICS & ECONOMICS, 2018, 82 :167-180
[15]  
FANG Z., 2021, IEEE T INFORM FORENS, V1
[16]  
Frees E.W., 2014, Predictive Modeling Applications in Actuarial Science, V1
[17]   Multivariate Frequency-Severity Regression Models in Insurance [J].
Frees, Edward W. ;
Lee, Gee ;
Yang, Lu .
RISKS, 2016, 4 (01)
[18]   Forecasting volatility:: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood [J].
González-Rivera, G ;
Lee, TH ;
Mishra, S .
INTERNATIONAL JOURNAL OF FORECASTING, 2004, 20 (04) :629-645
[19]   The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification [J].
Grimit, E. P. ;
Gneiting, T. ;
Berrocal, V. J. ;
Johnson, N. A. .
QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2006, 132 (621) :2925-2942
[20]  
Hartig F., 2022, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models