Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study

被引:31
作者
Barraza, Nestor [1 ]
Moro, Sergio [2 ]
Ferreyra, Marcelo [3 ]
de la Pena, Adolfo [4 ]
机构
[1] Univ Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
[2] IUL, ISTAR, ISCTE, Lisbon, Portugal
[3] Dataxplore, Buenos Aires, DF, Argentina
[4] Boldt Gaming, Buenos Aires, DF, Argentina
关键词
Customer targeting; direct marketing; feature selection; modelling; mutual information; sensitivity analysis; PREDICTION; MODELS;
D O I
10.1177/0165551518770967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.
引用
收藏
页码:53 / 67
页数:15
相关论文
共 44 条
[1]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[2]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[3]   The contribution of data mining to information science [J].
Chen, SY ;
Liu, XH .
JOURNAL OF INFORMATION SCIENCE, 2004, 30 (06) :550-558
[4]  
Cole AgathaM., 2012, CARDOZO ARTS ENT. L. J, V30, P283
[5]   Using sensitivity analysis and visualization techniques to open black box data mining models [J].
Cortez, Paulo ;
Embrechts, Mark J. .
INFORMATION SCIENCES, 2013, 225 :1-17
[6]  
Cortez P, 2010, LECT NOTES ARTIF INT, V6171, P572, DOI 10.1007/978-3-642-14400-4_44
[7]   Modeling wine preferences by data mining from physicochemical properties [J].
Cortez, Paulo ;
Cerdeira, Antonio ;
Almeida, Fernando ;
Matos, Telmo ;
Reis, Jose .
DECISION SUPPORT SYSTEMS, 2009, 47 (04) :547-553
[8]  
Cover Thomas M, 2006, Elements of information theory
[9]   A Few Useful Things to Know About Machine Learning [J].
Domingos, Pedro .
COMMUNICATIONS OF THE ACM, 2012, 55 (10) :78-87
[10]  
Embrechts MarkJ., 2003, International Journal of smart engineering system design, V5, P225, DOI DOI 10.1080/10255810390245555