Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting

被引:0
作者
Thakur A. [1 ]
Kumar A. [1 ]
Mishra S.K. [1 ]
Behera S.K. [2 ]
Sethi J. [3 ]
Sahu S.S. [4 ]
Swain S.K. [1 ]
机构
[1] Department of Electrical and Electronics Engineering, Birla Institute of Technology Mesra, Ranchi
[2] Department of Electronics and Telecommunication Engineering, DRIEMS Autonomous Engineering College, Tangi, Odisha, Cuttack
[3] Department of Electronics and Instrumentation Engineering, Odisha University of Technology and Research, Techno Campus, Ghatikia, Odisha, Bhubaneswar
[4] Department of Electronics and Communication Engineering, Birla Institute of Technology Mesra, Jharkhand, Ranchi
关键词
Interquartile Range (IQR); Natural Language Processing (NLP); Predictive Modeling; Term Frequency-Inverse Document Frequency (TF-IDF); XGBoost;
D O I
10.1007/s42979-024-02999-8
中图分类号
学科分类号
摘要
The study aims to introduce a novel machine learning approach for the prediction of product lengths by addressing diverse data types including numeric, textual and categorical data and extracting valuable information from the dataset to enhance prediction accuracy. This is achieved by employing methods that combine text vectorization, gradient boosting algorithm and feature encoding of categorical data, specifically using Term Frequency-Inverse Document Frequency (TF-IDF), eXtreme Gradient Boosting (XGBoost) and target encoding. Our method begins with thorough data preparation, removing outliers and filling in missing values, then extracts important features from product titles, descriptions, and bullet points present in the dataset. We convert text from product titles, descriptions, and bullet points into numerical form using the TF-IDF technique. It captures the weighted frequency of words in the form of TF-IDF feature vectors enabling the effective application of the algorithm. Our training process employs RandomizedSearchCV to optimize the XGBoost model’s hyperparameters utilizing TF-IDF vectors and target encoded product type IDs. This allows the model to effectively handle variability and uncertainty for product length predictions. The techniques used contribute to the adaptability of the method and enable accurate prediction of product length in e-commerce which can be helpful in inventory management across diverse products. This can extend their utility to optimize supply chain operations, improving demand forecasting across a variety of products, and aiding in strategic planning for procurement and stock levels. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
引用
收藏
相关论文
共 50 条
  • [31] An Ensemble Approach to Short-Term Wind Speed Predictions Using Stochastic Methods, Wavelets and Gradient Boosting Decision Trees
    Sivhugwana, Khathutshelo Steven
    Ranganai, Edmore
    WIND, 2024, 4 (01):
  • [32] Development of a wide-range soft sensor for predicting wastewater BOD5 using an eXtreme gradient boosting (XGBoost) machine
    Ching, P. M. L.
    Zou, X.
    Wu, Di
    So, R. H. Y.
    Chen, G. H.
    ENVIRONMENTAL RESEARCH, 2022, 210
  • [33] Development and validation of a novel diagnostic model for initially clinical diagnosed gastrointestinal stromal tumors using an extreme gradient-boosting machine
    Bozhi Hu
    Chao Wang
    Kewei Jiang
    Zhanlong Shen
    Xiaodong Yang
    Mujun Yin
    Bin Liang
    Qiwei Xie
    Yingjiang Ye
    Zhidong Gao
    BMC Gastroenterology, 21
  • [34] Click through Rate Effectiveness Prediction on Mobile Ads Using Extreme Gradient Boosting
    Moneera, AlAli
    Maram, AlQahtani
    Azizah, AlJuried
    AlOnizan, Taghareed
    Alboqaytah, Dalia
    Aslam, Nida
    Khan, Irfan Ullah
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (02): : 1681 - 1696
  • [35] Development and validation of a novel diagnostic model for initially clinical diagnosed gastrointestinal stromal tumors using an extreme gradient-boosting machine
    Hu, Bozhi
    Wang, Chao
    Jiang, Kewei
    Shen, Zhanlong
    Yang, Xiaodong
    Yin, Mujun
    Liang, Bin
    Xie, Qiwei
    Ye, Yingjiang
    Gao, Zhidong
    BMC GASTROENTEROLOGY, 2021, 21 (01)
  • [36] Construction and Validation of a Predictive Model for Coronary Artery Disease Using Extreme Gradient Boosting
    Zhang, Zheng
    Shao, Binbin
    Liu, Hongzhou
    Huang, Ben
    Gao, Xuechen
    Qiu, Jun
    Wang, Chen
    JOURNAL OF INFLAMMATION RESEARCH, 2024, 17 : 4163 - 4174
  • [37] Estimation of Fe Grade at an Ore Deposit Using Extreme Gradient Boosting Trees (XGBoost)
    Atalay, Firat
    MINING METALLURGY & EXPLORATION, 2024, : 2119 - 2128
  • [38] Identification of Insider Trading Using Extreme Gradient Boosting and Multi-Objective Optimization
    Deng, Shangkun
    Wang, Chenguang
    Li, Jie
    Yu, Haoran
    Tian, Hongyu
    Zhang, Yu
    Cui, Yong
    Ma, Fangjie
    Yang, Tianxiang
    INFORMATION, 2019, 10 (12)
  • [39] Grid-based Urban Fire Prediction Using Extreme Gradient Boosting (XGBoost)
    Oh, Haeng Yeol
    Jeong, Meong-Hun
    SENSORS AND MATERIALS, 2022, 34 (12) : 4879 - 4890
  • [40] Bayesian-optimized extreme gradient boosting models for classification problems: an experimental analysis of product return case
    Bhattacharjee, Biplab
    Unni, Kavya
    Pratap, Maheshwar
    Journal of Systems and Information Technology, 2024, 26 (04) : 495 - 527