Embedding Normalization: Significance Preserving Feature Normalization for Click-Through Rate Prediction

被引：0

作者：

Yi, Joonyoung ^{[1
]}

Kim, Beomsu ^{[1
]}

Chang, Buru ^{[1
]}

机构：

[1] Hyperconnect, Samseong, South Korea

来源：

21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021 | 2021年

关键词：

Click-Through Rate Prediction; Embedding Normalization; Factorization Machines;

D O I：

10.1109/ICDMW53433.2021.00016

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Normalization techniques are known to provide faster model convergence and good generalization performance, having achieved great success in computer vision and natural language processing. Recently, several deep neural network-based click-through rate (CTR) prediction models have applied such normalization techniques to their deep network components to make model training stable. However, we observe that applying existing normalization techniques (e.g. Batch Normalization and Layer Normalization) to feature embedding of the models leads to the degradation of model performance. In this study, we conjecture that existing normalization techniques can easily ignore the significance of each feature embedding, leading to suboptimal performance. To support our claim, we theoretically show that existing normalization techniques tend to equalize the norm of individual feature embedding. To overcome this limitation, we propose a theory-inspired normalization technique, called Embedding Normalization, which not only makes model training stable but also improves the performance of CTR prediction models by preserving the significance of each feature embedding. Through extensive experiments on various real-world CTR prediction datasets, we show that our proposed normalization technique leads to faster model convergence and achieves better or comparable performance than other normalization techniques. Especially, our Embedding Normalization is effective in not only deep neural network-based CTR prediction models but also shallow CTR prediction models that do not utilize deep neural network components.

引用

页码：75 / 84

页数：10

共 38 条

[1] [Anonymous], Rectifier Nonlinearities Improve Neural Network Acoustic Models
[2] Ba J., 2016, ARXIV160706450, V1050, P21
[3] Bjorck N., 2018, Adv Neural Inf Process Syst, V31, P7705
[4] Bojanowski P., 2017, T ASSOC COMPUT LING, V5, P135, DOI [10.1162/tacl_a_00051, DOI 10.1162/TACLA00051]
[5] Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention
Chen, Jingyuan
Zhang, Hanwang
He, Xiangnan
Nie, Liqiang
Liu, Wei
Chua, Tat-Seng
[J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 335 - 344
[6] Cheng H.-T., 2016, P 1 WORKSH DEEP LEAR, P7
[7] Cheng WY, 2020, AAAI CONF ARTIF INTE, V34, P3609
[8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9] Norm-Aware Embedding for Efficient Person Search
Chen, Di
Zhang, Shanshan
Yang, Jian
Schiele, Bernt
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12612 - 12621
[10] Gong C., 2018, Advances in Neural Information Processing Systems, P1334

← 1 2 3 4 →