A semi-supervised clustering-based classification model for classifying imbalanced data streams in the presence of scarcely labelled data

被引:2
作者
Bhowmick K. [1 ]
Narvekar M. [1 ]
机构
[1] Department of Computer Engineering, D.J. Sanghvi College of Engineering, Mumbai
关键词
Data streams; Expectation maximisation; Imbalanced data; Partially labelled; Semi-supervised clustering;
D O I
10.1504/IJBIDM.2022.120827
中图分类号
学科分类号
摘要
Data streams are potentially infinite in length, fast changing and scarcely labelled. It is practically impossible to label all the observed instances. Online frameworks for classifying data streams are generally supervised in nature assuming the availability of labelled data and hence cannot be used for data streams. Semi-supervised learning (SSL) addresses this problem of scarcely labelled data by using large amount of unlabelled data together with labelled data to build classifiers. Data streams may also suffer from the problem of imbalanced data. Previous works in learning from data streams have analysed problems of imbalanced data. But to the best of our knowledge no work has applied semi-supervised learning approaches for classifying imbalanced data streams so far. This paper proposes a model using a semi-supervised clustering technique to classify an imbalanced data stream in the presence of scarcely labelled data. The results prove that the model outperforms many state-of-the-art techniques. © 2022 Inderscience Enterprises Ltd.
引用
收藏
页码:170 / 191
页数:21
相关论文
共 24 条
[1]  
Beigy H., Ahmadi Z., Semi-supervised Ensemble Learning of Data Streams in the Presence of Concept Drift, pp. 526-537, (2012)
[2]  
Bhowmick K., Narvekar M., Learning to Classify Non-Stationary Imbalanced Data Streams - Issues and Challenges, (2018)
[3]  
Bhowmick K., Narvekar M., Khatkhatay M.A., A comprehensive study and analysis of semi supervised learning techniques, International Journal of Engineering Research & Technology (IJERT), 8, 11, pp. 810-816, (2019)
[4]  
Chapelle O., Scholkopf B., Alexander Z., Semi-Supervised Learning, (2006)
[5]  
Dua D., Graff C., UCI Machine Learning Repository, (2019)
[6]  
Dyer K.B., Polikar R., Semi-supervised learning in initially labeled non-stationary environments with gradual drift, WCCI 2012 IEEE World Congress on Computational Intelligence, (2012)
[7]  
Dyer K.B., Capo R., Polikar R., COMPOSE: a semisupervised learning framework for initially labeled nonstationary streaming data, IEEE Transactions on Neural Network and Learning Systems, 25, 1, pp. 12-26, (2014)
[8]  
Fan W., Systematic data selection to mine concept-drifting data streams, KDD'04, 10th International Conference on Knowledge Discovery and Data Mining, pp. 128-137, (2004)
[9]  
Han J., Kamber M., Pei J., Data Mining: Concepts and Techniques, (2006)
[10]  
Harries M., Splice-2 Comparative Evaluation: Electricity Pricing, (1999)