Product Engagement Detection Using Multi-Camera 3D Skeleton Reconstruction and Gaze Estimation

被引:0
作者
Tanonwong, Matus [1 ]
Zhu, Yu [1 ]
Chiba, Naoya [2 ]
Hashimoto, Koichi [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Aoba Ku, Sendai 9808579, Japan
[2] Osaka Univ, Grad Sch Informat Sci & Technol, 1-32 Machikaneyama, Toyonaka, Osaka 5600043, Japan
关键词
product engagement detection; retail analytics; gaze estimation; 2D/3D imaging; multi-camera system; computer vision;
D O I
10.3390/s25103031
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Product engagement detection in retail environments is critical for understanding customer preferences through nonverbal cues such as gaze and hand movements. This study presents a system leveraging a 360-degree top-view fisheye camera combined with two perspective cameras, the only sensors required for deployment, effectively capturing subtle interactions even under occlusion or distant camera setups. Unlike conventional image-based gaze estimation methods that are sensitive to background variations and require capturing a person's full appearance, raising privacy concerns, our approach utilizes a novel Transformer-based encoder operating directly on 3D skeletal keypoints. This innovation significantly reduces privacy risks by avoiding personal appearance data and benefits from ongoing advancements in accurate skeleton estimation techniques. Experimental evaluation in a simulated retail environment demonstrates that our method effectively identifies critical gaze-object and hand-object interactions, reliably detecting customer engagement prior to product selection. Despite yielding slightly higher mean angular errors in gaze estimation compared to a recent image-based method, the Transformer-based model achieves comparable performance in gaze-object detection. Its robustness, generalizability, and inherent privacy preservation make it particularly suitable for deployment in practical retail scenarios such as convenience stores, supermarkets, and shopping malls, highlighting its superiority in real-world applicability.
引用
收藏
页数:42
相关论文
共 51 条
[1]  
Abed A., 2022, P 2022 8 INT C SYST, P1
[2]   Generalizable Human Pose Triangulation [J].
Bartol, Kristijan ;
Bojanic, David ;
Petkovic, Tomislav ;
Pribanic, Tomislav .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11018-11027
[3]   EyeShopper: Estimating Shoppers' Gaze using CCTV Cameras [J].
Bermejo, Carlos ;
Chatzopoulos, Dimitris ;
Hui, Pan .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :2765-2774
[4]  
Bertasius G, 2017, Arxiv, DOI arXiv:1603.04908
[5]   Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views [J].
Dong, Junting ;
Jiang, Wen ;
Huang, Qixing ;
Bao, Hujun ;
Zhou, Xiaowei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7784-7793
[6]   HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation [J].
Doosti, Bardia ;
Naha, Shujon ;
Mirbagheri, Majid ;
Crandall, David J. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6607-6616
[7]   RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images [J].
Duan, Zhihao ;
Tezcan, M. Ozan ;
Nakamura, Hayato ;
Ishwar, Prakash ;
Konrad, Janusz .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :2700-2709
[8]  
Fang W., 2020, P 2020 8 INT C OR TE, P1
[9]  
Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, 10.48550/arXiv.2107.08430]
[10]   Human Pose as Compositional Tokens [J].
Geng, Zigang ;
Wang, Chunyu ;
Wei, Yixuan ;
Liu, Ze ;
Li, Houqiang ;
Hu, Han .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :660-671