GMM;
MIGMM;
chi(2) distribution;
Mahalanobis distance;
Adaptive optimal number;
Adaptive interval;
IMPROVED EM ALGORITHM;
INFORMATION CRITERION;
ORDER SELECTION;
CREDIT RISK;
K-MEANS;
IDENTIFICATION;
PREDICTION;
APPROXIMATION;
SYSTEMS;
D O I:
10.1016/j.jocs.2022.101874
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
Regarding the determination of the number of components (M) in a Gaussian mixture model (GMM), this study proposes a novel method for adaptively locating an optimal value of M when using a GMM to fit a given dataset; this method avoids underfitting and overfitting due to an unreasonable manually specified interval. The major contributions of this study are highlighted: (1) An adaptive interval for M (denoted as M is an element of [M-Min(Ada), M-Max(Ada)]) based on two procedures of a novel method, the modified incremental Gaussian mixture model (MIGMM), is determined via an adjustable parameter beta. (2) Considering some typical criteria, the optimal number.. within the obtained adaptive interval [M-Min(Ada), M-Max(Ada)], M-Opt(Ada) , is ultimately determined. Regarding the adaptive interval, extensive experiments with typical synthetic datasets show that [M-Min(Ada) M-Max(Ada)], corresponding to the parameter [beta(Min) = 10(-11), beta(Max) = 10(-2)], is determined. The performance of the M-Opt(Ada) determination based on several typical criteria is evaluated on both synthetic and real-world datasets.
机构:
Univ British Columbia, Irving K Barber Sch Arts & Sci, Dept Stat, Okanagan Campus,1177 Res Rd, Kelowna, BC V1V 1V7, CanadaUniv British Columbia, Irving K Barber Sch Arts & Sci, Dept Stat, Okanagan Campus,1177 Res Rd, Kelowna, BC V1V 1V7, Canada
机构:
Univ British Columbia, Irving K Barber Sch Arts & Sci, Dept Stat, Okanagan Campus,1177 Res Rd, Kelowna, BC V1V 1V7, CanadaUniv British Columbia, Irving K Barber Sch Arts & Sci, Dept Stat, Okanagan Campus,1177 Res Rd, Kelowna, BC V1V 1V7, Canada