A Class-Cluster k-Nearest Neighbors Method for Temporal In-Trouble Student Identification

被引：2

作者：

Chau Vo ^{[1
]}

Hua Phung Nguyen ^{[1
]}

机构：

[1] Vietnam Natl Univ, Ho Chi Minh City Univ Technol, Ho Chi Minh City, Vietnam

来源：

INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I | 2019年 / 11431卷

关键词：

Student classification; Educational data mining; k-nearest neighbors; Clustering ensemble; Fisher's discriminant ratio; Data imbalance;

D O I：

10.1007/978-3-030-14799-0_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal in-trouble student identification is a classification task at the program level that predicts a final study status of a current student at the end of his/her study time using the data gathered from the students in the past. Moreover, this task focuses on correct predictions for the in-trouble students whose predicted labels are at the lowest performance level. Educational datasets in this task have many challenging characteristics such as multiple classes, overlapping, and imbalance. Simultaneously handling these characteristics has not yet been investigated in educational data mining. For the existing general-purpose works, their methods are not straightforwardly applicable to the educational datasets. Therefore, in this paper, a novel method is proposed as an effective solution to the previously defined task. Combining the traditional k-nearest neighbors and clustering ensemble methods, our method is designed with three new features: relax the number k of the nearest neighbors, use a set of the cluster-based neighbors newly generated by partitioning the subspace of each class, and set four new criteria to decide a final class label rather than the majority voting scheme. As a result, it is a new lazy learning method able to provide correct predictions of more instances belonging to a positive minority class. In an empirical evaluation, higher Accuracy, Recall, and F-measure confirmed the effectiveness of our method as compared to some popular methods on our two real educational datasets and the benchmarking "Iris" dataset.

引用

页码：219 / 230

页数：12

共 18 条

[1]

Bayer J., 2012, PREDICTING DROP OUT

[2]

Biswajit Das Biswajit Das, 2013, International Journal of Integrative Biology, V14, P1

[3] Random forests [J].