Dataset Quality Assessment in Autonomous Networks with Permutation Testing

被引:4
作者
Camacho, Jose [1 ]
Wasielewska, Katarzyna [1 ]
机构
[1] Univ Granada, Dept Signal Theory Telemat & Comm, CITIC, Granada, Spain
来源
PROCEEDINGS OF THE IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM 2022 | 2022年
关键词
data quality assessment; permutation testing; anomaly detection; classification; network data; autonomous networks; self-driving networks;
D O I
10.1109/NOMS54207.2022.9789767
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The development of autonomous or self-driving networks is one of the main challenges faced by the telecommunication industry. Future networks are expected to realise a number of tasks, including network optimization and failure recovery, with minimal human supervision. In this context, the network community (manufacturers, operators, researchers, etc.) is looking at Machine Learning (ML) methods with high expectations. However, ML models can only be as good as the data they are trained on, which means that autonomous networks also require a sound autonomous procedure to assess, and if possible improve, data quality. Although the application of ML techniques in communication networks is ample in the literature, analyzing the quality of the network data seems an ignored problem. This paper presents work in progress on the application of permutation testing to assess the quality of network datasets. We illustrate our approach on a number of simple synthetic datasets with pre-established quality and then we demonstrate its application in a publicly available network dataset.
引用
收藏
页数:4
相关论文
共 21 条
[1]  
[Anonymous], 2017, International Journal on Advances in Software, V10, P1
[2]   A survey on artificial intelligence assurance [J].
Batarseh, Feras A. ;
Freeman, Laura ;
Huang, Chih-Hao .
JOURNAL OF BIG DATA, 2021, 8 (01)
[3]   Query-Oriented Data Cleaning with Oracles [J].
Bergman, Moria ;
Milo, Tova ;
Novgorodov, Slava ;
Tan, Wang-Chiew .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1199-1214
[4]   Machine Learning Methods with Noisy, Incomplete or Small Datasets [J].
Caiafa, Cesar F. ;
Sun, Zhe ;
Tanaka, Toshihisa ;
Marti-Puig, Pere ;
Sole-Casals, Jordi .
APPLIED SCIENCES-BASEL, 2021, 11 (09)
[5]   Tight Arms Race: Overview of Current Malware Threats and Trends in Their Detection [J].
Caviglione, Luca ;
Choras, Michal ;
Corona, Igino ;
Janicki, Artur ;
Mazurczyk, Wojciech ;
Pawlicki, Marek ;
Wasielewska, Katarzyna .
IEEE ACCESS, 2021, 9 :5371-5396
[6]   A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations? [J].
Cordeiro, Filipe R. ;
Carneiro, Gustavo .
2020 33RD SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2020), 2020, :9-16
[7]  
Ding JH, 2018, IEEE INT CONF BIG DA, P2795, DOI 10.1109/BigData.2018.8622640
[8]   An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult [J].
Dudjak, Mario ;
Martinovic, Goran .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 182
[9]   InSDN: A Novel SDN Intrusion Dataset [J].
Elsayed, Mahmoud Said ;
Le-Khac, Nhien-An ;
Jurcut, Anca D. .
IEEE ACCESS, 2020, 8 :165263-165284
[10]   Dealing with Noise Problem in Machine Learning Data-sets: A Systematic Review [J].
Gupta, Shivani ;
Gupta, Atul .
FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 :466-474