Applying Random Projection to the Classification of Malicious Applications using Data Mining Algorithms

被引:0
作者
Durand, Jan [1 ]
Atkison, Travis [1 ]
机构
[1] Louisiana Tech Univ, Dept Comp Sci, Ruston, LA 71270 USA
来源
PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE | 2012年
关键词
Malicious software detection; information retrieval; n-gram analysis; random projection; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This research is part of a continuing effort to show the viability of using random projection as a feature extraction and reduction technique in the classification of malware to produce more accurate classifiers. In this paper, we use a vector space model with n-gram analysis to produce weighted feature vectors from binary executables, which we then reduce to a smaller feature set using the random projection method proposed by Achlioptas, and the feature selection method of mutual information to produce two separate data sets. We then apply several popular machine learning algorithms including J48 decision tree, naive Bayes, support vector machines, and an instance-based learner to the data sets to produce classifiers for the detection of malicious executables. We evaluate the performance of the different classifiers and discover that using a data set reduced by random projection can improve the performance of support vector machine and instance-based learner classifiers.
引用
收藏
页数:6
相关论文
共 42 条
  • [1] Abou-Assaleh T, 2004, P INT COMP SOFTW APP, P41
  • [2] Abou-Assaleh T., 2004, P 28 ANN INT COMP SO, V2, P2
  • [3] Achlioptas D, 2001, P 20 ACM SIGMOD SIGA, DOI [DOI 10.1145/375551.375608, 10.1145/375551.375608]
  • [4] [Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
  • [5] [Anonymous], 1997, ICML
  • [6] [Anonymous], 2014, C4. 5: programs for machine learning
  • [7] [Anonymous], 2001, IEEE Data Eng. Bull.
  • [8] [Anonymous], SOFTWARE FORENSICS E
  • [9] Arnold W., 2000, P 2000 INT VIR B C S
  • [10] Atkison T., 2009, P 47 ACM SE C CLEMS