Partition-Based Clustering Algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024

被引:0
作者
Dutt, Ashish [1 ]
Ismail, Maizatul Akmar [2 ]
Herawan, Tutut [2 ]
Hashem, Ibrahim Abaker [3 ]
机构
[1] Monash Univ Malaysia, Sch Sci, Jalan Lagoon Selatan, Bandar 47500, Selangor, Malaysia
[2] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 57600, Selangor, Malaysia
[3] Univ Sharjah, Coll Comp & Informat, Dept Comp Sci, Sharjah, U Arab Emirates
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; unsupervised learning; data mining; K-MEANS ALGORITHM; GENERAL COEFFICIENT; SIMILARITY; DISTANCE; ASSOCIATION; LEARNER; PATTERNS;
D O I
10.1109/ACCESS.2024.3496929
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Educational Data Mining (EDM) is the application of data mining methods in the educational domain. In the EDM field, we see mixed data (i.e., text and number data types). Grouping or clustering such data is challenging because determining the similarity between mixed data is poorly defined. Existing partition clustering algorithms for handling such data are based on two approaches: conversion of data types, where all data variables are converted to a single data type, and a mixed one, where the similarity measures of different data types are merged by either using a weighted sum approach as in Gower's distance or by using mixed dissimilarity function as used in the k-Medoids algorithm to define a singular similarity measure for mixed data. Such a datatype conversion causes information loss, and this aspect is not discussed in the existing research literature. This study systematically reviews the past fifty-three years i.e. from 1971 to 2024 of research works on partition clustering algorithms applied to mixed data in EDM. A review of 104 research articles noted that most partitional clustering algorithms have continuous or categorical variables but not mixed variables. Researchers and practitioners often cite the lack of continuous and categorical variables analysis methods. Therefore, developing machine learning algorithms that can handle mixed data inherently present in the educational domain is increasingly becoming important. In addition to comparative analysis and analysis based on several factors, research gaps are also identified and mentioned in this article, and future insights are outlined.
引用
收藏
页码:172923 / 172942
页数:20
相关论文
共 104 条
  • [1] Agresti Alan., 2010, Wiley Series in Probability and Statistics, V2nd
  • [2] A k-mean clustering algorithm for mixed numeric and categorical data
    Ahmad, Amir
    Dey, Lipika
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) : 503 - 527
  • [3] initKmix-A novel initial partition generation algorithm for clustering mixed data using k-means-based clustering
    Ahmad, Amir
    Khan, Shehroz S.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
  • [4] Survey of State-of-the-Art Mixed Data Clustering Algorithms
    Ahmad, Amir
    Khan, Shehroz S.
    [J]. IEEE ACCESS, 2019, 7 : 31883 - 31902
  • [5] K-Harmonic means type clustering algorithm for mixed datasets
    Ahmad, Amir
    Hashmi, Sarosh
    [J]. APPLIED SOFT COMPUTING, 2016, 48 : 39 - 49
  • [6] Predicting Students' Academic Procrastination in Blended Learning Course Using Homework Submission Data
    Akram, Aftab
    Fu, Chengzhou
    Li, Yuyao
    Javed, Muhammad Yaqoob
    Lin, Ronghua
    Jiang, Yuncheng
    Tang, Yong
    [J]. IEEE ACCESS, 2019, 7 : 102487 - 102498
  • [7] Clustering Learners according to their Collaboration
    Anaya, Antonio R.
    Boticario, Jesus G.
    [J]. 2009 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, 2009, : 540 - 545
  • [8] Educational Data Mining Clustering Approach: Case Study of Undergraduate Student Thesis Topic
    Andre
    Suciati, Nanik
    Fabroyir, Hadziq
    Pardede, Eric
    [J]. IEEE ACCESS, 2023, 11 : 130072 - 130088
  • [9] Araripe PP, 2023, Arxiv, DOI arXiv:2307.02966
  • [10] Bahel Vedant, 2021, 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), P481, DOI 10.1109/ICCIKE51210.2021.9410741