Forecasting Smart Meter Energy Usage using Distributed Systems and Machine Learning

被引:10
作者
Dong, Chris [1 ]
Du, Lingzhi [1 ]
Ji, Feiran [1 ]
Song, Zizhen [1 ]
Zheng, Yuedi [1 ]
Howard, Alexander J. [1 ]
Intrevado, Paul [1 ]
Woodbridge, Diane Myung-kyung [1 ]
机构
[1] Univ San Francisco, Data Sci Program, San Francisco, CA 94117 USA
来源
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年
关键词
Distributed computing; Distributed databases; Machine learning; Data processing; Smart grids; CONSUMPTION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this research, we explore the technical and computational merits of a machine learning algorithm on a large data set, employing distributed systems. Using 167 million (10 GB) energy consumption observations collected by smart meters from residential consumers in London, England, we predict future residential energy consumption using a Random Forest machine learning algorithm. Distributed systems such as AWS S3 and EMR, MongoDB and Apache Spark are used. Computational times and predictive accuracy are evaluated. We conclude that there are significant computational advantages to using distributed systems when applying machine learning algorithms on large-scale data. We also observe that distributed systems can be computationally burdensome when the amount of data being processed is below a threshold at which it can leverage the computational efficiencies provided by distributed systems.
引用
收藏
页码:1293 / 1298
页数:6
相关论文
共 17 条
[1]  
Alejandro L., 2014, Working Paper No. ID -037
[2]  
[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
[3]  
[Anonymous], OV AM EMR ARCH
[4]  
Apache Spark, 2018, SPARK STAND MOD
[5]  
Apache Spark, 2018, AP SPARK LIGHTN FAST
[6]   Spark SQL: Relational Data Processing in Spark [J].
Armbrust, Michael ;
Xin, Reynold S. ;
Lian, Cheng ;
Huai, Yin ;
Liu, Davies ;
Bradley, Joseph K. ;
Meng, Xiangrui ;
Kaftan, Tomer ;
Franklint, Michael J. ;
Ghodsi, Ali ;
Zaharia, Matei .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1383-1394
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
CACI International Inc, 2018, AC SMART CONS CLASS
[9]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[10]   Predicting future hourly residential electrical consumption: A machine learning case study [J].
Edwards, Richard E. ;
New, Joshua ;
Parker, Lynne E. .
ENERGY AND BUILDINGS, 2012, 49 :591-603