Feature-Interaction-Enhanced Sequential Transformer for Click-Through Rate Prediction

被引:0
作者
Yuan, Quan [1 ]
Zhu, Ming [2 ]
Li, Yushi [2 ]
Liu, Haozhe [2 ]
Guo, Siao [2 ]
机构
[1] China Univ Geosci, Sch Mech Engn & Elect Informat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Hubei Key Lab Smart Internet Technol, Wuhan 430074, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 07期
基金
中国国家自然科学基金;
关键词
click-through-rate prediction; feature interaction; sequential recommendation; sequence pooling; self-attention;
D O I
10.3390/app14072760
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Click-through rate (CTR) prediction plays a crucial role in online services and applications, such as online shopping and advertising. The performance of CTR prediction can have a direct impact on user experience and the revenue of the online platforms. For CTR prediction models, self-attention-based methods have been widely applied to this field. Recent works generally adopted the Transformer architecture, where the self-attention mechanism can capture the global dependencies of the user's historical interactions and predict the next item. Despite the effectiveness of self-attention methods in modeling sequential user behaviors, most sequential recommenders hardly exploit feature interaction techniques to extract high-order feature combinations. In this paper, we propose a Feature-Interaction-Enhanced Sequence Model (FESeq), which integrates feature interaction and the sequential recommendation model in a cascading structure. Specifically, the interacting layer in FESeq is an automatic feature engineering step for the Transformer model. Then, we add a linear time interval embedding layer and a positional embedding layer to the Transformer in the sequence-refiner layer to learn both the time intervals and the position information in the user's sequence behaviors. We also design an attention-based sequence pooling layer that can model the relevance of the user's historical behaviors and the target item representation through scaled bilinear attention. Our experiments show that the proposed method beats all the baselines on both public and industrial datasets.
引用
收藏
页数:24
相关论文
共 46 条
  • [1] An MX, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P336
  • [2] [Anonymous], 2013, RecSys, DOI DOI 10.1145/2507157.2507166
  • [3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [4] Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
    Chen, Qiwei
    Zhao, Huan
    Li, Wei
    Huang, Pipei
    Ou, Wenwu
    [J]. 1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,
  • [5] Cheng Heng-Tze, 2016, P 1 WORKSH DEEP LEAR, P7
  • [6] Ghim-Eng Yap, 2012, Database Systems for Advanced Applications. Proceedings 17th International Conference, DASFAA 2012, P48, DOI 10.1007/978-3-642-29035-0_4
  • [7] Guo HF, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1725
  • [8] He RN, 2016, IEEE DATA MINING, P191, DOI [10.1109/ICDM.2016.0030, 10.1109/ICDM.2016.88]
  • [9] Neural Factorization Machines for Sparse Predictive Analytics
    He, Xiangnan
    Chua, Tat-Seng
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 355 - 364
  • [10] Hidasi B, 2016, Arxiv, DOI [arXiv:1511.06939, 10.48550/arxiv.1511.06939]