An Efficient Unsavory Data Detection Method for Internet Big Data

被引:0
作者
Ren, Peige [1 ]
Wang, Xiaofeng [1 ]
Sun, Hao [1 ]
Xu, Fen [2 ]
Zhao, Baokang [1 ]
Wu, Chunqing [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Hyperveloc Aerodynam Inst, China Aerodynam Res & Dev Ctr, Mianyang, Peoples R China
来源
INFORMATION AND COMMUNICATION TECHNOLOGY | 2015年 / 9357卷
关键词
High-dimensional feature space; Principal component analysis; Multi-dimensional index; Semantics-based similarity search;
D O I
10.1007/978-3-319-24315-3_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosion of information technologies, the volume and diversity of the data in the cyberspace are growing rapidly; meanwhile the unsavory data are harming the security of Internet. So how to detect the unsavory data from the Internet big data based on their inner semantic information is of growing importance. In this paper, we propose the i-Tree method, an intelligent semantics-based unsavory data detection method for internet big data. Firstly, the internet big data are mapped into a high-dimensional feature space, representing as high-dimensional points in the feature space. Secondly, to solve the "curse of dimensionality" problem of the high-dimensional feature space, the principal component analysis (PCA) method is used to reduce the dimensionality of the feature space. Thirdly, in the new generated feature space, we cluster the data objects, transform the data clusters into regular unit hyper-cubes and create one-dimensional index for data objects based on the idea of multi-dimensional index. Finally, we realize the semantics-based data detection for a given unsavory data object according to similarity search algorithm and the experimental results proved our method can achieve much better efficiency.
引用
收藏
页码:213 / 220
页数:8
相关论文
共 8 条
  • [1] Searching in high-dimensional spaces -: Index structures for improving the performance of multimedia Databases
    Böhm, C
    Berchtold, S
    Keim, D
    [J]. ACM COMPUTING SURVEYS, 2001, 33 (03) : 322 - 373
  • [2] Fedorchenko A., 2015, J WIREL MOB NETW UBI, V6, P41
  • [3] Jagadish H. V., 2008, ENCY GIS, P469
  • [4] Shahzad RK., 2013, J WIRELESS MOBILE NE, V4, P98
  • [5] Skovoroda A., 2015, Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, V6, P78
  • [6] A Convergent Solution to Matrix Bidirectional Projection Based Feature Extraction with Application to Face Recognition
    Zhan, Yubin
    Yin, Jianping
    Liu, Xinwang
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2011, 4 (05) : 863 - 873
  • [7] Robust local tangent space alignment via iterative weighted PCA
    Zhan, Yubin
    Yin, Jianping
    [J]. NEUROCOMPUTING, 2011, 74 (11) : 1985 - 1993
  • [8] Making the pyramid technique robust to query types and workloads
    Zhang, R
    Ooi, BC
    Tan, KL
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 313 - 324