Semi-greedy heuristics for feature selection with test cost constraints

被引:34
|
作者
Min F. [1 ]
Xu J. [1 ]
机构
[1] School of Computer Science, Southwest Petroleum University, Chengdu
基金
中国国家自然科学基金;
关键词
Feature selection; Granular computing; Semi-greedy; Test cost constraint;
D O I
10.1007/s41066-016-0017-2
中图分类号
学科分类号
摘要
In real-world applications, the test cost of data collection should not exceed a given budget. The problem of selecting an informative feature subset under this budget is referred to as feature selection with test cost constraints. Greedy heuristics are a natural and efficient method for this kind of combinatorial optimization problem. However, the recursive selection of locally optimal choices means that the global optimum is often missed. In this paper, we present a three-step semi-greedy heuristic method that directly forms a population of candidate solutions to obtain better results. In the first step, we design the heuristic function. The second step involves the random selection of a feature from the current best k features at each iteration. This is the major difference from conventional greedy heuristics. In the third step, we obtain p candidate solutions and select the best one. Through a series of experiments on four datasets, we compare our algorithm with a classic greedy heuristic approach and an information gain-based λ-weighted greedy heuristic method. The results show that the new approach is more likely to obtain optimal solutions. © 2016, Springer International Publishing Switzerland.
引用
收藏
页码:199 / 211
页数:12
相关论文
共 50 条
  • [31] Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution
    Marie C Galligan
    Radka Saldova
    Matthew P Campbell
    Pauline M Rudd
    Thomas B Murphy
    BMC Bioinformatics, 14
  • [32] Intrusion Feature Selection Using Modified Heuristic Greedy Algorithm of Itemset
    Onpans, Janya
    Rasmequan, Suwanna
    Jantarakongkul, Benchaporn
    Chinnasarn, Krisana
    Rodtook, Annupan
    2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 627 - 632
  • [33] Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution
    Galligan, Marie C.
    Saldova, Radka
    Campbell, Matthew P.
    Rudd, Pauline M.
    Murphy, Thomas B.
    BMC BIOINFORMATICS, 2013, 14
  • [34] Feature selection with time cost constraint
    Ding, H. (doceanh@163.com), 1600, Binary Information Press, Flat F 8th Floor, Block 3, Tanner Garden, 18 Tanner Road, Hong Kong (11): : 201 - 210
  • [35] Active learning of constraints for weighted feature selection
    Hijazi, Samah
    Hamad, Denis
    Kalakech, Mariam
    Kalakech, Ali
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2021, 15 (02) : 337 - 377
  • [36] Feature Selection Under Fairness and Performance Constraints
    Dorleon, Ginel
    Megdiche, Imen
    Bricon-Souf, Nathalie
    Teste, Olivier
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 125 - 130
  • [37] Active learning of constraints for weighted feature selection
    Samah Hijazi
    Denis Hamad
    Mariam Kalakech
    Ali Kalakech
    Advances in Data Analysis and Classification, 2021, 15 : 337 - 377
  • [38] Meta-heuristics for Feature Selection and Classification in Diagnostic Breast Cancer
    Khafaga, Doaa Sami
    Alhussan, Amel Ali
    El-kenawy, El-Sayed M.
    Takieldeen, Ali E.
    Hassan, Tarek M.
    Hegazy, Ehab A.
    Eid, Elsayed Abdel Fattah
    Ibrahim, Abdelhameed
    Abdelhamid, Abdelaziz A.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 748 - 765
  • [39] Forward semi-supervised feature selection
    Ren, Jiangtao
    Qiu, Zhengyuan
    Fan, Wei
    Cheng, Hong
    Yu, Philip S.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 970 - +
  • [40] A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints
    Chen, Chien-Hsing
    JOURNAL OF INFORMATION SCIENCE, 2013, 39 (03) : 359 - 371