Data Quality Assessment and Recommendation of Feature Selection Algorithms: An Ontological Approach

被引:4
作者
Nayak, Aparna [1 ]
Bozic, Bojan [1 ]
Longo, Luca [1 ]
机构
[1] Technol Univ Dublin, SFI Ctr Res Training Machine Learning, Sch Comp Sci, Dublin, Ireland
来源
JOURNAL OF WEB ENGINEERING | 2023年 / 22卷 / 01期
基金
爱尔兰科学基金会;
关键词
Data quality; feature selection algorithm; meta-features; ontol-ogy; recommendation; AUTOMATIC RECOMMENDATION; SYSTEM; CLASSIFICATION; SWRL;
D O I
10.13052/jwe1540-9589.2219
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Feature selection plays an important role in machine learning and data min-ing problems. Identifying the best feature selection algorithm that helps to remove irrelevant and redundant features is a complex task. This research tries to address it by recommending a feature selection algorithm based on dataset meta-features. The main contribution of the work is the use of Semantic Web principles to develop a recommendation model for the feature selection algorithm. As a result, dataset meta-features are modeled in a domain ontology, and a set of Semantic Web rule language (SWRL) pre-dictive rules have been proposed to recommend a feature selection algorithm. The result of this research is a feature selection algorithm recommendation based on the data characteristics and quality (FSDCQ) ontology, which not only helps with recommendations but also finds the data points with data quality violations. An experiment is conducted on the classification datasets from the UCI repository to evaluate the proposed ontology. The usefulness and effectiveness of the proposed method is evaluated by comparing it with the widely used method in the literature for the recommendation. Results show that the ontology-based recommendations are equally good as the widely used recommendation model, which is k-NN, with added benefits.
引用
收藏
页码:175 / 196
页数:22
相关论文
共 59 条
[1]  
Aduviri R, 2018, IEEE INT C BIOINFORM, P2726, DOI 10.1109/BIBM.2018.8621397
[2]   Introducing the Data Quality Vocabulary (DQV) [J].
Albertoni, Riccardo ;
Isaac, Antoine .
SEMANTIC WEB, 2021, 12 (01) :81-97
[3]  
Almeida Ricardo, 2015, IC3K 2015. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, P202
[4]   Methodologies for Data Quality Assessment and Improvement [J].
Batini, Carlo ;
Cappiello, Cinzia ;
Francalanci, Chiara ;
Maurino, Andrea .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[5]  
Bozic Bojan, 2016, CEUR WORKSHOP PROC, V1585, P62
[6]   An Ontology-based Approach for Failure Classification in Predictive Maintenance Using Fuzzy C-means and SWRL Rules [J].
Cao, Qiushi ;
Samet, Ahmed ;
Zanni-Merk, Cecilia ;
de Beuvron, Francois de Bertrand ;
Reich, Christoph .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 :630-639
[7]  
Cappiello C., 2004, P 2004 INT WORKSHOP, V1, P68
[8]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[9]   A disease diagnosis and treatment recommendation system based on big data mining and cloud computing [J].
Chen, Jianguo ;
Li, Kenli ;
Rong, Huigui ;
Bilal, Kashif ;
Yang, Nan ;
Li, Keqin .
INFORMATION SCIENCES, 2018, 435 :124-149
[10]   A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection [J].
Chen, Rung-Ching ;
Huang, Yun-Hou ;
Bau, Cho-Tsan ;
Chen, Shyi-Ming .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (04) :3995-4006