On the feature engineering of building energy data mining

被引:82
作者
Zhang, Chuan [1 ,2 ]
Cao, Liwei [2 ,3 ]
Romagnoli, Alessandro [1 ,2 ]
机构
[1] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore, Singapore
[2] Cambridge Ctr Adv Res Energy Efficiency Singapore, 1 Create Way, Singapore, Singapore
[3] Univ Cambridge, Dept Chem Engn & Biotechnol, Philippa Fawcett Dr, Cambridge, England
基金
新加坡国家研究基金会;
关键词
Building energy; Feature engineering; Exploratory data analysis; Principal component analysis; Random forest; PREDICTION; CONSUMPTION; ELECTRICITY; KNOWLEDGE; FRAMEWORK; SAVINGS;
D O I
10.1016/j.scs.2018.02.016
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Understanding the underlying dynamics of building energy consumption is the very first step towards energy saving in building sector; as a powerful tool for knowledge discovery, data mining is being applied to this domain more and more frequently. However, most of previous researchers focus on model development during the pipeline of data mining, with feature engineering simply being overlooked. To fill this gap, three different feature engineering approaches, namely exploratory data analysis (EDA) as a feature visualization method, random forest (RF) as a feature selection method and principal component analysis (PCA) as a feature extraction method, are investigated in the paper. These feature engineering methods are tested with a building energy consumption dataset with 124 features, which describe the building physics, weather condition, and occupant behavior. The 124 features are analyzed and ranked in this paper. It is found that although feature importance depends on specific machine learning model, yet certain features will always dominate the feature space. The outcome of this study favors the usage of effective yet computationally cheap feature engineering methods such as EDA; for other building energy data mining problems, the method proposed in this study still holds important implications since it provides a starting point where efficient feature engineering and machine learning models could be further developed.
引用
收藏
页码:508 / 518
页数:11
相关论文
共 25 条
  • [1] Efficient Machine Learning for Big Data: A Review
    Al-Jarrah, Omar Y.
    Yoo, Paul D.
    Muhaidat, Sami
    Karagiannidis, George K.
    Taha, Kamal
    [J]. BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
  • [2] [Anonymous], SPIE SMART STRUCTURE
  • [3] [Anonymous], PECAN STREET PROJECT
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] EnergyPlus: creating a new-generation building energy simulation program
    Crawley, DB
    Lawrie, LK
    Winkelmann, FC
    Buhl, WF
    Huang, YJ
    Pedersen, CO
    Strand, RK
    Liesen, RJ
    Fisher, DE
    Witte, MJ
    Glazer, J
    [J]. ENERGY AND BUILDINGS, 2001, 33 (04) : 319 - 331
  • [6] CLUSTER SEPARATION MEASURE
    DAVIES, DL
    BOULDIN, DW
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 224 - 227
  • [8] A Few Useful Things to Know About Machine Learning
    Domingos, Pedro
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (10) : 78 - 87
  • [9] Prediction of building energy consumption by using artificial neural networks
    Ekici, Betul Bektas
    Aksoy, U. Teoman
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 2009, 40 (05) : 356 - 362
  • [10] A framework for knowledge discovery in massive building automation data and its application in building diagnostics
    Fan, Cheng
    Xiao, Fu
    Yan, Chengchu
    [J]. AUTOMATION IN CONSTRUCTION, 2015, 50 : 81 - 90