Feature Selection Methods for Linked Data Limitations, Capabilities and Potentials

被引:1
作者
Cherrington, Marianne [1 ]
Lu, Joan [3 ]
Airehrour, David [2 ]
Xu, Qiang [4 ]
Madanian, Samaneh [5 ]
Wade, Stephen [3 ]
机构
[1] Univ Huddersfield, Sch Comp & Engn, Huddersfield, W Yorkshire, England
[2] Unitec Inst Technol, Sch Appl Business, Auckland, New Zealand
[3] Univ Huddersfield, Dept Comp Sci, Huddersfield, W Yorkshire, England
[4] Univ Huddersfield, Dept Engn Technol, Huddersfield, W Yorkshire, England
[5] Auckland Univ Technol, Dept Comp Sci, Auckland, New Zealand
来源
BDCAT'19: PROCEEDINGS OF THE 6TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES | 2019年
关键词
Linked Data (LD); Feature Selection (FS); Heterogeneous Data; High-Dimensional Data (HDD); Dimensionality Reduction; SOCIAL MEDIA DATA; GENE SELECTION; MICROARRAY DATA; CLASSIFICATION; CANCER; ALGORITHMS; PREDICTION;
D O I
10.1145/3365109.3368792
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.
引用
收藏
页码:103 / 112
页数:10
相关论文
共 90 条
[1]   Tracking information epidemics in blogspace [J].
Adar, E ;
Adamic, LA .
2005 IEEE/WIC/ACM International Conference on Web Intelligence, Proceedings, 2005, :207-214
[2]   Evolutionary Network Analysis: A Survey [J].
Aggarwal, Charu ;
Subbian, Karthik .
ACM COMPUTING SURVEYS, 2014, 47 (01)
[3]  
Airehrour D, 2019, AD HOC NETWORKS J TE, V7, P16
[4]  
Airehrour D, 2019, REDUCING ICT CARBON, DOI [10.12948/ie2019/ 04.17, DOI 10.12948/IE2019/04.17]
[5]  
Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
[6]  
[Anonymous], 2016, P 2016 SAI COMP C
[7]  
[Anonymous], 2012, P 18 ACM SIGKDD INT
[8]  
[Anonymous], 2007, P 24 INT C MACH LEAR
[9]  
[Anonymous], 2009, SPRINGER SERIES STAT
[10]  
[Anonymous], 2005, ACM SIGKDD EXPLOR NE, DOI 10.1145/1117454.1117456