Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web Logs

被引:1
作者
Hsu, Che-Yun [1 ]
Chen, Ting-Rui [1 ]
Chen, Hung-Hsuan [1 ]
机构
[1] Natl Cent Univ, Comp Sci & Informat Engn, 300 Zhongda Rd, Taoyuan 320, Taiwan
来源
ACM JOURNAL OF DATA AND INFORMATION QUALITY | 2022年 / 14卷 / 02期
关键词
Clickstream; user behavior; log analysis; user modeling; VARIABLES; TAU;
D O I
10.1145/3490392
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web logs have been widely used to represent the web page visits of online users. However, we found that web logs in Chrome's browsing history only record 57% of users' visited websites, i.e., nearly half of a user's website visits are not recorded. Additionally, 5.1% of the visits recorded in the web log occur because of unconscious user actions, i.e., these page visits are not initiated from users. We created a Google Chrome plugin and recruited users to install the plugin to collect and analyze the conscious URI, visits, unconscious URL visits, and "missing" URL visits (i.e., the visits unrecorded in the traditional web log). We reported the statistics of these behaviors. We showed that sorting popular website categories based on traditional web logs differs from the rankings obtained when including missing visits or excluding unintentional visits. We predicted users' future behaviors based on three types of training data - all the visits in modern web logs, the intentional visits in web logs, and the intentional visits plus missing visits in web logs. The experimental results indicate that missing visits in web logs may contain additional information, and unintentional visits in web logs may contain more noise than information for user modeling. Consequently, we need to be careful of the observations and conclusions derived from web log analyses because the web log data could be an incomplete and noisy dataset of a user's visited web pages.
引用
收藏
页数:17
相关论文
共 60 条
[1]  
[Anonymous], 1966, Elements of Psychophysics
[2]   Co-learning Multiple Browsing Tendencies of a User by Matrix Factorization-based Multitask Learning [J].
Bai, Guo-Jhen ;
Lien, Cheng-You ;
Chen, Hung-Hsuan .
2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, :253-257
[3]   Characterizing and Predicting Early Reviewers for Effective Product Marketing on E-Commerce Websites [J].
Bai, Ting ;
Zhao, Wanye Xin ;
He, Yulan ;
Nie, Jian-Yun ;
Wen, Ji-Rong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (12) :2271-2284
[4]   Stuart's tau measure of effect size for ordinal variables: Some methodological considerations [J].
Berry, Kenneth J. ;
Johnston, Janis E. ;
Zahran, Sammy ;
Mielke, Paul W., Jr. .
BEHAVIOR RESEARCH METHODS, 2009, 41 (04) :1144-1148
[5]   Buy It Again: Modeling Repeat Purchase Recommendations [J].
Bhagat, Rahul ;
Muralidharan, Srevatsan ;
Lobzhanidze, Alex ;
Vishwanath, Shankar .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :62-70
[6]   Hybrid recommender systems: Survey and experiments [J].
Burke, R .
USER MODELING AND USER-ADAPTED INTERACTION, 2002, 12 (04) :331-370
[7]   The Spread of Behavior in an Online Social Network Experiment [J].
Centola, Damon .
SCIENCE, 2010, 329 (5996) :1194-1197
[8]  
Chen H.H., 2011, Collabseer: A Search Engine for Collaboration Discovery, P231, DOI [10.1145/1998076.1998121, DOI 10.1145/1998076.1998121]
[9]   Differentiating Regularization Weights - A Simple Mechanism to Alleviate Cold Start in Recommender Systems [J].
Chen, Hung-Hsuan ;
Chen, Pu .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (01)
[10]   Behavior2Vec: Generating Distributed Representations of Users' Behaviors on Products for Recommender Systems [J].
Chen, Hung-Hsuan .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (04)