A Flat-Hierarchical Approach Based on Machine Learning Model for e-Commerce Product Classification

被引:2
作者
Cotacallapa, Harold [1 ]
Saboya, Nemias [1 ]
Canas Rodrigues, Paulo [2 ]
Salas, Rodrigo [3 ,4 ]
Linkolk Lopez-Gonzales, Javier [5 ]
机构
[1] Univ Peruana Union, Fac Ingn & Arquitectura, Lima 15464, Peru
[2] Univ Fed Bahia, Dept Stat, BR-40110909 Salvador, Brazil
[3] Univ Valparaiso, Escuela Ingn C Biomed, Valparaiso 2362905, Chile
[4] Millennium Inst Intelligent Healthcare Engn iHealt, Santiago 7820436, Chile
[5] Univ Peruana Union, Escuela Posgrad, Lima 15464, Peru
关键词
Electronic commerce; Classification algorithms; Machine learning algorithms; Bayes methods; Support vector machines; Random forests; Performance evaluation; Data models; South America; Logistic regression; Inventory management; Product delivery; Machine learning; e-commerce; hierarchical product classification; local classifier per level; ensemble; LOGISTIC-REGRESSION; ALGORITHMS;
D O I
10.1109/ACCESS.2024.3400693
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Within the e-commerce sphere, optimizing the product classification process assumes pivotal importance, owing to its direct influence on operational efficiency and profitability. In this context, employing machine learning algorithms stands out as a premier solution for effectively automating this process. The design of these models commonly adopts either a flat or local (hierarchical) approach. However, each of them exhibits significant limitations. The regional approach introduces taxonomic inconsistencies in predictions, whereas the flat approach becomes inefficient when dealing with extensive datasets featuring high granularity. Therefore, our research introduces a solution for hierarchical product classification based on a Machine Learning model that integrates flat and local (hierarchical) classification approaches using a 4-level electronic product dataset obtained from a renowned e-commerce platform in Latin America. In pursuit of this goal, a comparative analysis of seven machine learning algorithms, including Multinomial Naive Bayes, Linear Support Vector Classifier, Multinomial Logistic Regression, Random Forest, XGBoost, FastText, and Voting Ensemble, was conducted. This hybrid approach model performs better than models using a single approach. It surpassed the top-performing flat approach model by 0.15% and outperformed the leading local approach (Local Classifier per Level) model by 4.88%, as measured by the weighted F1-score. Additionally, this paper contributes to the academic community by presenting a significant Spanish-language dataset comprising over one million products and discussing the preprocessing techniques tailored for the dataset. It also addresses the study's inherent limitations and potential avenues for future exploration in this field.
引用
收藏
页码:72730 / 72745
页数:16
相关论文
共 64 条
[1]   A gradient boosting classifier for purchase intention prediction of online shoppers [J].
Abdullah-All-Tanvir ;
Khandokar, Iftakhar Ali ;
Islam, A. K. M. Muzahidul ;
Islam, Salekul ;
Shatabda, Swakkhar .
HELIYON, 2023, 9 (04)
[2]   Effective Products Categorization with Importance Scores and Morphological Analysis of the Titles [J].
Akritidis, Leonidas ;
Fevgas, Athanasios ;
Bozanis, Panayiotis .
2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, :213-220
[3]  
Allweyer O., 2020, DATA 2020, P67
[4]  
Bi Y, 2020, Arxiv, DOI arXiv:2008.06179
[5]  
Bojanowski P., 2017, T ASS COMPUT LINGUIS, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACL_A_00051, 10.1162/tacl_a_00051, DOI 10.1162/TACLA00051]
[6]   An evaluation of global-model hierarchical classification algorithms for hierarchical classification problems with single path of labels [J].
Borges, Helyane Bronoski ;
Silla, Carlos N., Jr. ;
Nievola, Julio Cesar .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (10) :1991-2002
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Brinkmann Alexander, 2021, P WORKSH KNOWL MAN E, V44, P14
[9]  
Cevahir Ali, 2016, P COLING 2016 26 INT, P525
[10]   Combining Multiple Features for Product Categorisation by Multiple Kernel Learning [J].
Chavaltada, Chanawee ;
Pasupa, Kitsuchart ;
Hardoon, David R. .
RECENT ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY 2018, 2019, 769 :3-12