The motor vehicle insurance industry in China has developed rapidly over a huge market size. Auto insurance premium has been the primary component of non-life insurer's income, in recent years, up to 70%. To set a proper premium is one of the most important problems in insurance business of which the prediction of future loss, including severity and frequency, is the core issue. We focus on the method of frequency prediction, trying to compare various methods and models in a statistical view. Generalized Linear Models (GLM), are traditionally widest used model, however they are also being criticized for being too rigid in distribution assumption, and failing to describe dependence between variables well. In recent years, with the rapid development of big data and machine learning, some scholars have applied machine learning algorithms to auto claim data, and their study has shown that methods based on machine learning prevail over traditional GLM models in some perspectives, but these results mainly lean on certain datasets. We will compare GLM and machine learning algorithms, including Deep Neural Network (DNN), Random Forest, Support Vector Machine (SVM) and XGboost, over six datasets from different country. In particular, one of the datasets is auto insurance group customer data, which is very different from the usual bulk-vehicle business. For example, it cannot use human factors, and the vehicle claims within the same team have a dependency relationship (intra-group independence). All these make generalized linear models don't handle this data well. First, we build different GLMs on the training sets to predict frequency and choose the best based on AIC. And we train the machining learning models mentioned above using the same training sets. In addition, we compare these models by computing mean squared error on the test sets. Our study shows that XGboost model have a higher performance than GLM in all datasets. But in the situation of larger number of predictors, higher dependence between variables and bigger datasets, RNN and DNN models are better than XGboost. SVM's performance are inferior to GLM generally.