BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

被引：4

作者：

Wang, Dong ^{[1
]}

Salamatian, Kave ^{[2
]}

Xia, Yunqing ^{[1
]}

Deng, Weiwei ^{[1
]}

Zhang, Qi ^{[1
]}

机构：

[1] Microsoft Corp, STCA, Beijing, Peoples R China

[2] Univ Savoie, Annecy, France

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

关键词：

Non-textual features; Multi-modal inputs; Pre-trained language model; CTR prediction; Uni-Attention;

D O I：

10.1145/3580305.3599780

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now, two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are learned only in the aggregation layer. The second one consists of splitting and transforming non-textual features into fine-grained tokens that are fed, along with textual tokens, directly into the transformer layers of language models. However, by adding additional tokens, this approach increases the complexity of the learning and inference. We propose in this paper, a novel framework, BERT4CTR, that addresses these limitations. The new framework leverages Uni-Attention mechanism to benefit from the interactions between non-textual and textual features, while maintaining low training and inference time-costs, through a dimensionality reduction. We demonstrate through comprehensive experiments on both public and commercial data that BERT4CTR outperforms significantly the state-of-the-art approaches to handle multi-modal inputs and is applicable to CTR prediction. In comparison with ensemble framework, BERT4CTR brings more than 0.4% AUC gain on both tested data sets with only 7% increase on latency.

引用

页码：5039 / 5050

页数：12

共 34 条

[1]

[Anonymous], 2019, ARXIV PREPRINT ARXIV, DOI DOI 10.48550/ARXIV.1907.11692

[2]

[Anonymous], 2019, P 1 INT WORKSH DEEP

[3]

[Anonymous], 2013, International Journal of Engineering Trends and Technology

[4]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]

[5]

Cheng Heng-Tze, 2016, DLRS 2016: Workshop on Deep Learning for Recommender Systems, P7

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7] MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search [J].

Fan, Miao ;

Guo, Jiacheng ;

Zhu, Shuai ;

Miao, Shuo ;

Sun, Mingming ;

Li, Ping .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2509-2517

[8] An introduction to ROC analysis [J].

Fawcett, Tom .

PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874

[9]

Guo HF, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1725

[10] DeText: A Deep Text Ranking Framework with BERT [J].

Guo, Weiwei ;

Liu, Xiaowei ;

Wang, Sida ;

Gao, Huiji ;

Sankar, Ananth ;

Yang, Zimeng ;

Guo, Qi ;

Zhang, Liang ;

Long, Bo ;

Chen, Bee-Chung ;

Agarwal, Deepak .

CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :2509-2516

← 1 2 3 4 →