Empirical Comparison of Cross-Validation and Test Data on Internet Traffic Classification Methods

被引:4
作者
Jonathan, Oluranti
Omoregbe, Nicholas
Misra, Sanjay
机构
来源
3RD INTERNATIONAL CONFERENCE ON SCIENCE AND SUSTAINABLE DEVELOPMENT (ICSSD 2019): SCIENCE, TECHNOLOGY AND RESEARCH: KEYS TO SUSTAINABLE DEVELOPMENT | 2019年 / 1299卷
关键词
cross-validation; classification; performance metrics; machine learning;
D O I
10.1088/1742-6596/1299/1/012044
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In this paper, we compare two validation methods that are used to estimate the performance of classification algorithms in a non-problem-specific knowledge scenario. One way to measure the performance of a classification algorithm is to determine its prediction error rate. However, this value cannot be calculated but estimated. In this work, we apply and compare two common methods used for estimation namely: test data and cross-validation. Precisely, we analyze and compare the statistical properties of the K-fold cross-validation and test data estimators of the prediction error rates of six classifiers namely; Naive Bayes, KNN, Random Forest, SVM, J48, and OneR. From the study, the statistical property of repeated cross-validation tends to stabilize the prediction error estimation which in turn reduces the variance of the prediction error estimator when compared with test data. The NIMS dataset collected over a network was employed in the experimental study.
引用
收藏
页数:9
相关论文
共 14 条
  • [1] Baron G., 2016, COMPUTER INFORM SCI, V659
  • [2] On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems
    Baron, Grzegorz
    Harezlak, Katarzyna
    [J]. INTELLIGENT DECISION TECHNOLOGIES 2016, PT II, 2016, 57 : 149 - 159
  • [3] Barthakur Pijush, 2013, International Journal of Modern Education and Computer Science, V5, P9, DOI 10.5815/ijmecs.2013.10.02
  • [4] A Survey on Internet Traffic Identification
    Callado, Arthur
    Kamienski, Carlos
    Szabo, Geza
    Gero, Balazs Peter
    Kelner, Judith
    Fernandes, Stenio
    Sadok, Djamel
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2009, 11 (03): : 37 - 52
  • [5] Issues and Future Directions in Traffic Classification
    Dainotti, Alberto
    Pescape, Antonio
    Claffy, Kimberly C.
    [J]. IEEE NETWORK, 2012, 26 (01): : 35 - 40
  • [6] DING Y, 2012, J XIAN JIAOTONG U, V2
  • [7] Toward an efficient and scalable feature selection approach for internet traffic classification
    Fahad, Adil
    Tari, Zahir
    Khalil, Ibrahim
    Habib, Ibrahim
    Alnuweiri, Hussein
    [J]. COMPUTER NETWORKS, 2013, 57 (09) : 2040 - 2057
  • [8] Katarzyna Stapor, 2017, INT C COMP REC SYST
  • [9] Li Ding, 2014, APPL MECH MAT, V687
  • [10] Information-Centric Network Function Virtualization over 5G Mobile Wireless Networks
    Liang, Chengchao
    Yu, F. Richard
    Zhang, Xi
    [J]. IEEE NETWORK, 2015, 29 (03): : 68 - 74