Dynamic feature scaling for online learning of binary classifiers

被引:31
作者
Bollegala, Danushka [1 ]
机构
[1] Univ Liverpool, Liverpool, Merseyside, England
关键词
Feature scaling; Online learning; Classification; CLASSIFICATION; ALGORITHMS;
D O I
10.1016/j.knosys.2017.05.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scaling feature values is an important step in numerous machine learning tasks. Different features can have different value ranges and some form of a feature scaling is often required in order to learn an accurate classifier. However, feature scaling is conducted as a preprocessing task prior to learning. This is problematic in an online setting because of two reasons. First, it might not be possible to accurately determine the value range of a feature at the initial stages of learning when we have observed only a handful of training instances. Second, the distribution of data can change over time, which render obsolete any feature scaling that we perform in a pre-processing step. We propose a simple but an effective method to dynamically scale features at train time, thereby quickly adapting to any changes in the data stream. We compare the proposed dynamic feature scaling method against more complex methods for estimating scaling parameters using several benchmark datasets for classification. Our proposed feature scaling method consistently outperforms more complex methods on all of the benchmark datasets and improves classification accuracy of a state-of-the-art online classification algorithm. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:97 / 105
页数:9
相关论文
共 32 条
[1]  
Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380
[2]  
Bertsekas D. P., 2010, 2848 LIDS MIT
[3]  
Bishop C., 2006, Pattern recognition and machine learning, P423
[4]   Cross-Domain Sentiment Classification Using Sentiment Sensitive Embeddings [J].
Bollegala, Danushka ;
Mu, Tingting ;
Goulermas, John Yannis .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) :398-410
[5]  
Bollegala D, 2011, GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, P1771
[6]   Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus [J].
Bollegala, Danushka ;
Weir, David ;
Carroll, John .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) :1719-1731
[7]   ALGORITHMS FOR COMPUTING THE SAMPLE VARIANCE - ANALYSIS AND RECOMMENDATIONS [J].
CHAN, TF ;
GOLUB, GH ;
LEVEQUE, RJ .
AMERICAN STATISTICIAN, 1983, 37 (03) :242-247
[8]  
Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1
[9]  
Crammer K, 2006, J MACH LEARN RES, V7, P551
[10]  
Crammer K., 2008, NIPS 08