Real-time user clickstream behavior analysis based on apache storm streaming

被引:3
|
作者
Pal, Gautam [1 ]
Atkinson, Katie [1 ]
Li, Gangmin [2 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Liverpool L69 7ZX, Merseyside, England
[2] Univ Bedfordshire, Sch Comp Sci & Technol, Luton LU1 3JU, Beds, England
关键词
Clickstream analytics; Real-time big data analytics; Real-time data ingestion; Apache storm; Cassandra; Datastax; SPARSITY PROBLEM;
D O I
10.1007/s10660-021-09518-4
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper presents an approach to analyzing consumers' e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer's usage pattern, we uncover the user's browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user's clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.
引用
收藏
页码:1829 / 1859
页数:31
相关论文
共 22 条
  • [1] Real-time user clickstream behavior analysis based on apache storm streaming
    Gautam Pal
    Katie Atkinson
    Gangmin Li
    Electronic Commerce Research, 2023, 23 : 1829 - 1859
  • [2] Apache Storm Based on Topology for Real-Time Processing of Streaming Data from Social Networks
    Batyuk, Anatoliy
    Voityshyn, Volodymyr
    PROCEEDINGS OF THE 2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2016, : 345 - 349
  • [3] Machine Learning-Based Real-time Task Scheduling for Apache Storm
    Wu, Cheng-Ying
    Zhao, Qi
    Cheng, Cheng-Yu
    Yang, Yuchen
    Qureshi, Muhammad A.
    Liu, Hang
    Chen, Genshe
    SENSORS AND SYSTEMS FOR SPACE APPLICATIONS XVII, 2024, 13062
  • [4] Real-time Hybrid Intrusion Detection System using Apache Storm
    Mylavarapu, Goutam
    Thomas, Johnson
    Kumar, Ashwin T. K.
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1436 - 1441
  • [5] Dynamically Scaling Apache Storm for the Analysis of Streaming Data
    van der Veen, Jan Sipke
    van der Waaij, Bram
    Lazovik, Elena
    Wijbrandi, Wilco
    Meijer, Robert J.
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 154 - 161
  • [6] Real-time Support Vector Machine Based Network Intrusion Detection System Using Apache Storm
    Manzoor, Muhammad Asif
    Morgan, Yasser
    7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016, 2016,
  • [7] A Real-time Anomalies Detection System based on Streaming Technology
    Du, Yutan
    Liu, Jun
    Liu, Fang
    Chen, Luying
    2014 SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL 2, 2014, : 275 - 279
  • [8] Quality-based workload scaling for real-time streaming systems
    Smirnov, Pavel A.
    Nasonov, Denis
    5TH INTERNATIONAL YOUNG SCIENTIST CONFERENCE ON COMPUTATIONAL SCIENCE, YSC 2016, 2016, 101 : 323 - 332
  • [9] Design and implementation of the real-time GIS data model and Sensor Web service platform for environmental big data management with the Apache Storm
    Chen, Zeqiang
    Chen, Nengcheng
    Gong, Jianya
    2015 FOURTH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2015,
  • [10] Research on CNN Parallel Computing and Learning Architecture Based on Real-Time Streaming Architecture
    Zhu, Yuting
    Qian, Liang
    Wang, Chuyan
    Ding, Lianghui
    Yang, Feng
    Wang, Hao
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2019, 2019, 11319 : 150 - 158