A Class-Cluster k-Nearest Neighbors Method for Temporal In-Trouble Student Identification

被引:2
作者
Chau Vo [1 ]
Hua Phung Nguyen [1 ]
机构
[1] Vietnam Natl Univ, Ho Chi Minh City Univ Technol, Ho Chi Minh City, Vietnam
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I | 2019年 / 11431卷
关键词
Student classification; Educational data mining; k-nearest neighbors; Clustering ensemble; Fisher's discriminant ratio; Data imbalance;
D O I
10.1007/978-3-030-14799-0_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal in-trouble student identification is a classification task at the program level that predicts a final study status of a current student at the end of his/her study time using the data gathered from the students in the past. Moreover, this task focuses on correct predictions for the in-trouble students whose predicted labels are at the lowest performance level. Educational datasets in this task have many challenging characteristics such as multiple classes, overlapping, and imbalance. Simultaneously handling these characteristics has not yet been investigated in educational data mining. For the existing general-purpose works, their methods are not straightforwardly applicable to the educational datasets. Therefore, in this paper, a novel method is proposed as an effective solution to the previously defined task. Combining the traditional k-nearest neighbors and clustering ensemble methods, our method is designed with three new features: relax the number k of the nearest neighbors, use a set of the cluster-based neighbors newly generated by partitioning the subspace of each class, and set four new criteria to decide a final class label rather than the majority voting scheme. As a result, it is a new lazy learning method able to provide correct predictions of more instances belonging to a positive minority class. In an empirical evaluation, higher Accuracy, Recall, and F-measure confirmed the effectiveness of our method as compared to some popular methods on our two real educational datasets and the benchmarking "Iris" dataset.
引用
收藏
页码:219 / 230
页数:12
相关论文
共 18 条
[1]  
Bayer J., 2012, PREDICTING DROP OUT
[2]  
Biswajit Das Biswajit Das, 2013, International Journal of Integrative Biology, V14, P1
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Chujai P., 2017, P INT MULT ENG COMP, VI, P1
[5]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[6]   SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary [J].
Fernandez, Alberto ;
Garcia, Salvador ;
Herrera, Francisco ;
Chawla, Nitesh V. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 :863-905
[7]  
Ho TK, 2002, IEEE T PATTERN ANAL, V24, P289, DOI 10.1109/34.990132
[8]   Predicting Student Performance from Multiple Data Sources [J].
Koprinska, Irena ;
Stretton, Joshua ;
Yacef, Kalina .
ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2015, 2015, 9112 :678-681
[9]  
Kravvaris D., 2012, ARTIF INTELL APPL IN, V382, P401, DOI DOI 10.1007/978-3-642-33412-2_41
[10]   An overlap-sensitive margin classifier for imbalanced and overlapping data [J].
Lee, Han Kyu ;
Kim, Seoung Bum .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 98 :72-83