Claim Frequency Modeling and Prediction via Machine Learning

被引：0

作者：

Zeng Yuzhe ^{[1
]}

Wu Aibo ^{[1
]}

Zheng Hongyuan ^{[1
]}

Luo Laijuan ^{[1
]}

机构：

[1] Renmin Univ China, Sch Stat, Beijing 100872, Peoples R China

来源：

PROCEEDINGS OF 2018 CHINA INTERNATIONAL CONFERENCE ON INSURANCE AND RISK MANAGEMENT | 2018年

关键词：

Auto insurance; Claim frequency; Machine learning;

D O I：

暂无

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

The motor vehicle insurance industry in China has developed rapidly over a huge market size. Auto insurance premium has been the primary component of non-life insurer's income, in recent years, up to 70%. To set a proper premium is one of the most important problems in insurance business of which the prediction of future loss, including severity and frequency, is the core issue. We focus on the method of frequency prediction, trying to compare various methods and models in a statistical view. Generalized Linear Models (GLM), are traditionally widest used model, however they are also being criticized for being too rigid in distribution assumption, and failing to describe dependence between variables well. In recent years, with the rapid development of big data and machine learning, some scholars have applied machine learning algorithms to auto claim data, and their study has shown that methods based on machine learning prevail over traditional GLM models in some perspectives, but these results mainly lean on certain datasets. We will compare GLM and machine learning algorithms, including Deep Neural Network (DNN), Random Forest, Support Vector Machine (SVM) and XGboost, over six datasets from different country. In particular, one of the datasets is auto insurance group customer data, which is very different from the usual bulk-vehicle business. For example, it cannot use human factors, and the vehicle claims within the same team have a dependency relationship (intra-group independence). All these make generalized linear models don't handle this data well. First, we build different GLMs on the training sets to predict frequency and choose the best based on AIC. And we train the machining learning models mentioned above using the same training sets. In addition, we compare these models by computing mean squared error on the test sets. Our study shows that XGboost model have a higher performance than GLM in all datasets. But in the situation of larger number of predictors, higher dependence between variables and bigger datasets, RNN and DNN models are better than XGboost. SVM's performance are inferior to GLM generally.

引用

页码：594 / 616

页数：23

共 11 条

[1] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[2] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[3] FREES EDWARD W., 2008, J AM STAT ASS
[4] GUELMAN L., 2012, EXPERT SYSTEM APPL
[5] Mcculloch W., 1943, Bull. Math. Biophys, V5, P127, DOI [DOI 10.1007/BF02478259, 10.1007/BF02478259]
[6] MICHAEL K, 1989, S THEOR COMP
[7] MICHAEL K, 1988, THOUGHTS HYPOT UNPUB
[8] GENERALIZED LINEAR MODELS
NELDER, JA
WEDDERBURN, RW
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1972, 135 (03): : 370 - +
[9] PIET DE J, 2009, GEN LINEAR MODELS IN
[10] SAKTHIVEL K M, 2017, GLOBAL J PURE APPL M

← 1 2 →