Accuracy of Nonparametric Density Estimation for Univariate Gaussian Mixture Models: a Comparative Study

被引:0
作者
Arnastauskaite, Jurgita [1 ,2 ]
Ruzgas, Tomas [1 ]
机构
[1] Kaunas Univ Technol, Dept Appl Math, K Donelaicio G 73, LT-44249 Kaunas, Lithuania
[2] Kaunas Univ Technol, Dept Comp Sci, K Donelaicio G 73, LT-44249 Kaunas, Lithuania
关键词
univariate probability density; nonparametric density estimation; homogeneity test; sample clustering; Monte Carlo method; municipal solid waste; SOLID-WASTE GENERATION; MAXIMUM-LIKELIHOOD; TESTS;
D O I
10.3846/mma.2020.10505
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Flexible and reliable probability density estimation is fundamental in unsupervised learning and classification. Finite Gaussian mixture models are commonly used for this purpose. However, the parametric form of the distribution is not always known. In this case, non-parametric density estimation methods are used. Usually, these methods become computationally demanding as the number of components increases. In this paper, a comparative study of accuracy of some nonparametric density estimators is made by means of simulation. The following approaches have been considered: an adaptive bandwidth kernel estimator, a projection pursuit estimator, a logspline estimator, and a k-nearest neighbor estimator. It was concluded that data clustering as a pre-processing step improves the estimation of mixture densities. However, in case data does not have clearly defined clusters, the pre-preprocessing step does not give that much of advantage. The application of density estimators is illustrated using municipal solid waste data collected in Kaunas (Lithuania). The data distribution is similar (i.e., with kurtotic unimodal density) to the benchmark distribution introduced by Marron and Wand. Based on the homogeneity tests it can be concluded that distributions of the municipal solid waste fractions in Kutaisi (Georgia), Saint-Petersburg (Russia), and Boryspil (Ukraine) are statistically indifferent compared to the distribution of waste fractions in Kaunas. The distribution of waste data collected in Kaunas (Lithuania) follows the general observations introduced by Marron and Wand (i.e., has one mode and certain kurtosis).
引用
收藏
页码:622 / 641
页数:20
相关论文
共 37 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] Goodness of fit and homogeneity tests on the basis of N-distances
    Bakshaev, Aleksej
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (11) : 3750 - 3758
  • [3] THE BOOTSTRAPPED MAXIMUM-LIKELIHOOD ESTIMATOR WITH AN APPLICATION
    BURKE, MD
    GOMBAY, E
    [J]. STATISTICS & PROBABILITY LETTERS, 1991, 12 (05) : 421 - 427
  • [4] Chernova S., 2007, AAMAS 07, P1, DOI 10.1145/1329125.1329407
  • [5] Christiansen K.M., 1999, 28 EUR ENV AG DEP ME
  • [6] Multivariate density estimation: A comparative study
    Cwik, J
    Koronacki, J
    [J]. NEURAL COMPUTING & APPLICATIONS, 1997, 6 (03) : 173 - 185
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Seasonal variation of municipal solid waste generation and composition in four East European cities
    Denafas, Gintaras
    Ruzgas, Tomas
    Martuzevicius, Dainius
    Shmarin, Sergey
    Hoffmann, Michael
    Mykhaylenko, Valeriy
    Ogorodnik, Stanislav
    Romanov, Mikhail
    Neguliaeva, Ekaterina
    Chusov, Alexander
    Turkadze, Tsitsino
    Bochoidze, Inga
    Ludwig, Christian
    [J]. RESOURCES CONSERVATION AND RECYCLING, 2014, 89 : 22 - 30
  • [9] New multivariate product density estimators
    Devroye, L
    Krzyzak, A
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2002, 82 (01) : 88 - 110
  • [10] Fix E., 1951, JOSEPH