Exploiting covariate embeddings for classification using Gaussian processes

被引：1

作者：

Andrade, Daniel ^{[1
]}

Tamura, Akihiro ^{[2
]}

Tsuchida, Masaaki ^{[3
]}

机构：

[1] NEC Corp Ltd, Secur Res Labs, Tokyo, Japan

[2] Ehime Univ, Grad Sch Sci & Engn, Matsuyama, Ehime, Japan

[3] DeNA Co Ltd, Tokyo, Japan

来源：

PATTERN RECOGNITION LETTERS | 2018年 / 104卷

关键词：

Logistic regression; Auxiliary information of covariates; Gaussian process; Text classification; TEXT CLASSIFICATION;

D O I：

10.1016/j.patrec.2018.01.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many logistic regression tasks, auxiliary information about the covariates is available. For example, a user might be able to specify a similarity measure between the covariates, or an embedding (feature vector) for each covariate, which is created from unlabeled data. In particular for text classification, the covariates (words) can be described by word embeddings or similarity measures from lexical resources like WordNet. We propose a new method to use such embeddings of covariates for logistic regression. Our method consists of two main components. The first component is a Gaussian process (GP) with a covariance function that models the correlations between covariates, and returns a noise-free estimate of the covariates. The second component is a logistic regression model that uses these noise-free estimates. One advantage of our model is that the covariance function can be adjusted to the training data using maximum likelihood. Another advantage is that new covariates that never occurred in the training data can be incorporated at test time, while run-time increases only linearly in the number of new covariates. Our experiments demonstrate the usefulness of our method in situations when only small training data is available. (c) 2018 Elsevier B.V. All rights reserved.

引用

页码：8 / 14

页数：7

共 50 条

[31] Using word embeddings in Twitter election classification
Xiao Yang
Craig Macdonald
Iadh Ounis
Information Retrieval Journal, 2018, 21 : 183 - 207
[32] Debate Stance Classification Using Word Embeddings
Konjengbam, Anand
Ghosh, Subrata
Kumar, Nagendra
Singh, Manish
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 382 - 395
[33] Kernel Machine Classification Using Universal Embeddings
Boufounos, Petros T.
Mansour, Hassan
2015 DATA COMPRESSION CONFERENCE (DCC), 2015, : 440 - 440
[34] Using word embeddings in Twitter election classification
Yang, Xiao
Macdonald, Craig
Ounis, Iadh
INFORMATION RETRIEVAL JOURNAL, 2018, 21 (2-3): : 183 - 207
[35] Sentiment analysis with covariate-assisted word embeddings
Xu, Shirong
Dai, Ben
Wang, Junhui
ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 3015 - 3039
[36] Robust Library Building for Autonomous Classification of Downhole Geophysical Logs Using Gaussian Processes
Katherine L. Silversides
Arman Melkumyan
Pure and Applied Geophysics, 2017, 174 : 1255 - 1268
[37] Land Cover Classification With Gaussian Processes Using Spatio-Spectro-Temporal Features
Bellet, Valentine
Fauvel, Mathieu
Inglada, Jordi
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[38] Robust Library Building for Autonomous Classification of Downhole Geophysical Logs Using Gaussian Processes
Silversides, Katherine L.
Melkumyan, Arman
PURE AND APPLIED GEOPHYSICS, 2017, 174 (03) : 1255 - 1268
[39] Unsupervised Quadratic Discriminant Embeddings Using Gaussian Mixture Models
Szekely, Eniko
Bruno, Eric
Marchand-Maillet, Stephane
KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, 2011, 128 : 107 - 120
[40] Gaussian Embeddings for Collaborative Filtering
Dos Santos, Ludovic
Piwowarski, Benjamin
Gallinari, Patrick
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1065 - 1068

← 1 2 3 4 5 →