A novel ensemble approach for heterogeneous data with active learning

被引:9
作者
Salama, Mohamed [1 ]
Abdelkader, Hatem [1 ]
Abdelwahab, Amira [1 ,2 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Dept Informat Syst, Shibin Al Kawm 32511, Egypt
[2] King Faisal Univ, Coll Comp Sci & Informat Technol, Dept Informat Syst, Al Hasa, Saudi Arabia
来源
INTERNATIONAL JOURNAL OF ENGINEERING BUSINESS MANAGEMENT | 2022年 / 14卷
关键词
Heterogeneous data; text classification; active learning; machine learning; information extraction; ensemble method; natural language processing; CLASSIFICATION;
D O I
10.1177/18479790221082605
中图分类号
F [经济];
学科分类号
02 ;
摘要
At present, millions of internet users are contributing a huge amount of data. This data is extremely heterogeneous, and so, it is hard to analyze and derive information from this data that is considered an indispensable source for decision-makers. Due to this massive growth, the classification of data and analysis has become an important research subject. Extracting information from this data has become a necessity. As a result, it was necessary to process these enormous volumes of data to uncover hidden information and therefore improve data analysis and, in turn, classification accuracy. In this paper, firstly, we focus on developing an ensemble machine-learning model based on active learning which identifies the most effective feature extraction strategy for heterogeneous data analysis, and compare it with traditional machine-learning algorithms. Secondly, we evaluate the proposed model during the experiments; five heterogeneous datasets from various domains were used, such as a Health Care Reform dataset, Sander Frandsen dataset, Financial Phrase Bank dataset, SMS Spam Collection dataset, and Textbook sales dataset. According to the results, the novel approach for data analysis performed better than conventional methods. Finally, the study's findings confirmed the validity of the suggested technique, meeting the study's goal of using ensemble methods with active learning to raise the model's overall accuracy for effectively classifying and analyzing heterogeneous data, reducing the time and money spent training the model, and delivering superior analysis performance as well as insights into other elements of extracting information from heterogeneous data.
引用
收藏
页数:10
相关论文
共 26 条
  • [1] Abdelaal Hammam M., 2018, Journal of Electrical Systems and Information Technology, V5, P363, DOI 10.1016/j.jesit.2018.03.001
  • [2] Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text
    Al-Azani, Sadam
    El-Alfy, El-Sayed M.
    [J]. 8TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2017) AND THE 7TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT 2017), 2017, 109 : 359 - 366
  • [3] Al-Fairouz EI, 2020, INT J ADV COMPUT SC, V11, P501
  • [4] Al-Hagery M., 2020, Indonesian Journal of Electrical Engineering and Computer Science, V19, P1010, DOI 10.11591/ijeecs.v19.i2.pp1010-1020
  • [5] Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification
    Alhaj, Yousif A.
    Udara, Wiraj
    Hussain, Aamir
    Al-qaness, Mohammed A. A.
    Abdelaal, Hammam M.
    [J]. PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND COMMUNICATION ENGINEERING (ICTCE 2018), 2018, : 397 - 401
  • [6] Birman Y., 2021, INFOR FUS, V77, P133
  • [7] Entity Matching on Unstructured Data: An Active Learning Approach
    Brunner, Ursin
    Stockinger, Kurt
    [J]. 2019 6TH SWISS CONFERENCE ON DATA SCIENCE (SDS), 2019, : 97 - 102
  • [8] Using Machine Learning to Predict the Sentiment of Online Reviews: A New Framework for Comparative Analysis
    Budhi, Gregorius Satia
    Chiong, Raymond
    Pranata, Ilung
    Hu, Zhongyi
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2021, 28 (04) : 2543 - 2566
  • [9] A Novel Active Learning Method Using SVM for Text Classification
    Goudjil M.
    Koudil M.
    Bedda M.
    Ghoggali N.
    [J]. International Journal of Automation and Computing, 2018, 15 (03) : 290 - 298
  • [10] Comparing automated text classification methods
    Hartmann, Jochen
    Huppertz, Juliana
    Schamp, Christina
    Heitmann, Mark
    [J]. INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2019, 36 (01) : 20 - 38