Framework Based on Simulation of Real-World Message Streams to Evaluate Classification Solutions

被引:0
作者
Hojas-Mazo, Wenny [1 ]
Macia-Perez, Francisco [2 ]
Martinez, Jose Vicente Berna [2 ]
Moreno-Espino, Mailyn [3 ]
Fonseca, Iren Lorenzo [2 ]
Pavon, Juan [4 ]
机构
[1] Univ Tecnol La Habana, Fac Ingn Informat, Dept Inteligencia Artificial Infraestruct Sistemas, Calle 114 11901,Entre 119 & 127, Marianao 19390, La Habana, Cuba
[2] Univ Alicante, Dept Comp Sci & Technol, Alicante 03690, Spain
[3] Inst Politecn Nacl, Ctr Invest Comp, Ciudad De Mexico 07738, Mexico
[4] Univ Complutense Madrid, Inst Tecnol Conocimiento, Madrid 28040, Spain
关键词
classification; evaluation; non-stationary message streams; simulation;
D O I
10.3390/a17010047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Analysing message streams in a dynamic environment is challenging. Various methods and metrics are used to evaluate message classification solutions, but often fail to realistically simulate the actual environment. As a result, the evaluation can produce overly optimistic results, rendering current solution evaluations inadequate for real-world environments. This paper proposes a framework based on the simulation of real-world message streams to evaluate classification solutions. The framework consists of four modules: message stream simulation, processing, classification and evaluation. The simulation module uses techniques and queueing theory to replicate a real-world message stream. The processing module refines the input messages for optimal classification. The classification module categorises the generated message stream using existing solutions. The evaluation module evaluates the performance of the classification solutions by measuring accuracy, precision and recall. The framework can model different behaviours from different sources, such as different spammers with different attack strategies, press media or social network sources. Each profile generates a message stream that is combined into the main stream for greater realism. A spam detection case study is developed that demonstrates the implementation of the proposed framework and identifies latency and message body obfuscation as critical classification quality parameters.
引用
收藏
页数:15
相关论文
共 30 条
[21]   A review of spam email detection: analysis of spammer strategies and the dataset shift problem [J].
Janez-Martino, Francisco ;
Alaiz-Rodriguez, Rocio ;
Gonzalez-Castro, Victor ;
Fidalgo, Eduardo ;
Alegre, Enrique .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (02) :1145-1173
[22]   An adaptive personalized news dissemination system [J].
Katakis, Ioannis ;
Tsoumakas, Grigorios ;
Banos, Evangelos ;
Bassiliades, Nick ;
Vlahavas, Ioannis .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2009, 32 (02) :191-212
[23]  
Marrs GR, 2010, LECT NOTES ARTIF INT, V6291, P459, DOI 10.1007/978-3-642-15280-1_42
[24]  
Nelson B., 2011, P 4 ACM WORKSHOP SEC, P87
[25]   Distributed Architecture for an Elderly Accompaniment Service Based on IoT Devices, AI, and Cloud Services [J].
Perez, Francisco Macia ;
Fonseca, Iren Lorenzo ;
Martinez, Jose Vicente Berna ;
Macia-Fiteni, Alex .
IEEE MULTIMEDIA, 2023, 30 (01) :17-27
[26]   SDAI: An integral evaluation methodology for content-based spam filtering models [J].
Perez-Diaz, Noemi ;
Ruano-Ordas, David ;
Fdez-Riverola, Fiorentino ;
Mendez, Jose R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) :12487-12500
[27]   A survey on data preprocessing for data stream mining: Current status and future directions [J].
Ramirez-Gallego, Sergio ;
Krawczyk, Bartosz ;
Garcia, Salvador ;
Wozniak, Michal ;
Herrera, Francisco .
NEUROCOMPUTING, 2017, 239 :39-57
[28]   Evaluating Stream Classifiers with Delayed Labels Information [J].
Souza, Vinicius M. A. ;
da Silva, Tiago Pinho ;
Batista, Gustavo E. A. P. A. .
2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, :408-413
[29]   A Spam Filtering Method Based on Multi-Modal Fusion [J].
Yang, Hong ;
Liu, Qihe ;
Zhou, Shijie ;
Luo, Yang .
APPLIED SCIENCES-BASEL, 2019, 9 (06)
[30]   Data Stream Classification Based on Extreme Learning Machine: Review [J].
Zheng, Xiulin ;
Li, Peipei ;
Wu, Xindong .
BIG DATA RESEARCH, 2022, 30