Searching for an Optimal Partition of Incomplete Data with Application in Modeling Energy Efficiency of Public Buildings

被引:5
作者
Scitovski, Rudolf [1 ]
Susac, Marijana Zekic [2 ]
Has, Adela [2 ]
机构
[1] Univ Osijek, Dept Math, Trg Ljudevita Gaja 6, Osijek 31000, Croatia
[2] Univ Osijek, Fac Econ, Trg Ljudevita Gaja 7, Osijek 31000, Croatia
关键词
clustering; incomplete data; missing data; optimal partition; energy efficiency of public buildings;
D O I
10.17535/crorr.2018.0020
中图分类号
F [经济];
学科分类号
02 ;
摘要
In this paper, we consider the problem of searching for an optimal partition with the most appropriate number of clusters for an incomplete data set A subset of R-n in which several outliers might occur. Special attention is given to the application of the Least Squares distance-like function. The procedure of preparing the incomplete data set and the outlier elimination procedure are proposed such that the clustering process gives acceptable solutions. Appropriate justifications with proof are provided for these procedures. An incremental algorithm for searching for optimal partitions with 2, 3, . . . clusters is applied on the prepared data set. After that, by using the Davies-Bouldin and the Calinski-Harabasz index the most appropriate number of clusters is determined. The whole procedure is organized as an algorithm given in the paper. In order to illustrate its applicability, the above steps are applied on the real data set of public buildings and their energy efficiency data, providing clear clusters that could be used for further modeling procedures.
引用
收藏
页码:255 / 268
页数:14
相关论文
共 42 条
  • [1] Abi-Nahed J., 1973, LECT NOTES COMPUTER, V4191
  • [2] [Anonymous], 1983, CLUSTER FORMATION AN
  • [3] AYRAMO S, 2006, THESIS
  • [4] An efficient algorithm for the incremental construction of a piecewise linear classifier
    Bagirov, A. M.
    Ugon, J.
    Webb, D.
    [J]. INFORMATION SYSTEMS, 2011, 36 (04) : 782 - 790
  • [5] Bezdek J., 2005, FUZZY MODELS ALGORIT
  • [6] ST-DBSCAN: An algorithm for clustering spatial-temp oral data
    Birant, Derya
    Kut, Alp
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) : 208 - 221
  • [7] Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
  • [8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [9] PATTERN-RECOGNITION WITH PARTLY MISSING DATA
    DIXON, JK
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (10): : 617 - 621
  • [10] Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226