Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

被引:20
|
作者
Arif, Arooj [1 ]
Javaid, Nadeem [1 ]
Aldegheishem, Abdulaziz [2 ]
Alrajeh, Nabil [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad 44000, Pakistan
[2] King Saud Univ KSU, Coll Architecture & Planning, Urban Planning Dept, Riyadh, Saudi Arabia
[3] King Saud Univ KSU, Biomed Technol Dept, Coll Appl Med Sci, Riyadh, Saudi Arabia
关键词
big data; electricity theft detection; hyperactive optimization toolkit; machine learning; smart grids; urban planning; IMBALANCED DATA; OPTIMIZATION; SYSTEMS;
D O I
10.1002/cpe.6316
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Electricity theft (ET) causes major revenue loss in power utilities. It reduces the quality of supply, raises production cost, causes legal consumers to pay the higher cost, and impacts the economy as a whole. In this article, we use the State Grid Corporation of China (SGCC) dataset, which contains electricity consumption data of 1035 days for two classes: normal and fraudulent. In this work, ET detection model is proposed that consists of four steps: interpolation, data balancing, feature extraction, and classification. First, missing values of the dataset are recovered using the interpolation method. Second, resampling technique is implemented. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. Third, residual network extracts the latent features from the SGCC dataset. Fourth, three tree based classifiers, such as decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost) are applied to train the encoded feature vectors for classification. Besides, search for good hyperparameters is a challenging task, which is usually done manually and takes a considerable amount of time. To resolve this problem, Bayesian optimizer is used to simplify the tuning process of DT, RF, and AdaBoost. Finally, the results indicate that RF outperforms DT and AdaBoost.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Critical review of machine learning approaches to apply big data analytics in DDoS forensics
    Hoon, Kian Son
    Yeo, Kheng Cher
    Azam, Sami
    Shanmugam, Bharanidharan
    De Boer, Friso
    2018 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2018,
  • [32] Malware Analytics: Review of Data Mining, Machine Learning and Big Data Perspectives
    Poudyal, Subash
    Akhtar, Zahid
    Dasgupta, Dipankar
    Gupta, Kishor Datta
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 649 - 656
  • [33] Machine Learning Approaches for Auto Insurance Big Data
    Hanafy, Mohamed
    Ming, Ruixing
    RISKS, 2021, 9 (02) : 1 - 23
  • [34] A SURVEY OF MACHINE LEARNING ALGORITHMS FOR BIG DATA ANALYTICS
    Athmaja, S.
    Hanumanthappa, M.
    Kavitha, Vasantha
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [35] Big data analytics and machine learning: A retrospective overview and bibliometric analysis
    Zhang, Justin Zuopeng
    Srivastava, Praveen Ranjan
    Sharma, Dheeraj
    Eachempati, Prajwal
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [36] Use of Machine Learning in Big Data Analytics for Insider Threat Detection
    Mayhew, Michael
    Atighetchi, Michael
    Adler, Aaron
    Greenstadt, Rachel
    2015 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2015), 2015, : 915 - 922
  • [37] Using Big Data-machine learning models for diabetes prediction and flight delays analytics
    Nibareke, Therence
    Laassiri, Jalal
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [38] Electricity theft detection using big data and genetic algorithm in electric power systems
    Shehzad, Faisal
    Javaid, Nadeem
    Aslam, Sheraz
    Javed, Muhammad Umar
    ELECTRIC POWER SYSTEMS RESEARCH, 2022, 209
  • [39] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [40] Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique
    Venkatesh, R.
    Balasubramanian, C.
    Kahappan, M.
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (08)