Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

被引:0
作者
Juvonen, Antti [1 ]
Sipola, Tuomo [1 ]
机构
[1] Univ Jyvaskyla, Dept Math Informat Technol, Jyvaskyla, Finland
来源
IV INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS 2012 (ICUMT) | 2012年
关键词
intrusion detection; anomaly detection; n-grams; diffusion map; k-means; data mining; machine learning; DIFFUSION MAPS; KNOWLEDGE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However., these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log Ides. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then be analyzed in more detail. We expand our previous work by elaborating the cluster analysis after obtaining the low-dimensional representation. The framework was tested with actual server log data collected from a large web service. Several previously unknown intrusions were found. Proposed methods could be customized to analyze any kind of log data. The system could be used as a real-time anomaly detection system in any network where sufficient data is available.
引用
收藏
页码:274 / 279
页数:6
相关论文
共 19 条
  • [1] Abou-Assaleh T, 2004, P INT COMP SOFTW APP, P41
  • [2] [Anonymous], PATTERN ANAL MACHINE
  • [3] [Anonymous], 2007, NIST SPECIAL PUBLICA
  • [4] Diffusion maps
    Coifman, Ronald R.
    Lafon, Stephane
    [J]. APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2006, 21 (01) : 5 - 30
  • [5] GAUGING SIMILARITY WITH N-GRAMS - LANGUAGE-INDEPENDENT CATEGORIZATION OF TEXT
    DAMASHEK, M
    [J]. SCIENCE, 1995, 267 (5199) : 843 - 848
  • [6] David G., 2009, THESIS TEL AVIV U
  • [7] David G., 2011, APPL COMPUTATIONAL H
  • [8] The KDD process for extracting useful knowledge from volumes of data
    Fayyad, U
    PiatetskyShapiro, G
    Smyth, P
    [J]. COMMUNICATIONS OF THE ACM, 1996, 39 (11) : 27 - 34
  • [9] Fayyad U, 1996, AI MAG, V17, P37
  • [10] Ganapathiraju M., 2002, HLT'02 Proceedings of the second international conference on Human Language Technology Research, P76