Addressing Non-Representative Surveys using Multiple Instance Learning

被引：0

作者：

Katz, Yaniv ^{[1
]}

Vainas, Oded ^{[1
]}

机构：

[1] Similarweb, Tel Aviv, Israel

来源：

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年

关键词：

Multiple Instance Learning; Multiple Instance Regression; Neural networks; Attention Pooling; Bag Representation; Instance Representation; NONRESPONSE;

D O I：

10.1145/3447548.3467109

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.

引用

页码：3117 / 3127

页数：11

共 51 条

[1] Multiple instance classification: Review, taxonomy and comparative study
Amores, Jaume
[J]. ARTIFICIAL INTELLIGENCE, 2013, 201 : 81 - 105
[2] [Anonymous], 2001, ICML
[3] [Anonymous], 2015, PROC CVPR IEEE
[4] [Anonymous], 2017, J. Adv. Transp.
[5] Privacy Concerns in Responses to Sensitive Questions. A Survey Experiment on the Influence of Numeric Codes on Unit Nonresponse, Item Nonresponse, and Misreporting
Bader, Felix
Bauer, Johannes
Kroher, Martina
Riordan, Patrick
[J]. METHODS DATA ANALYSES, 2016, 10 (01): : 47 - 71
[6] Battaglia MP., 2009, Survey Practice, V2, P2953, DOI DOI 10.29115/SP-2009-0019
[7] Blanchet J., 2012, Surv. Operat. Res. Manag. Sci, V17, P38, DOI 10.1016/j.sorms.2011.09.002
[8] Buskirk T. D., 2018, Survey Practice, V11, P1, DOI [10.29115/sp-2018-0004, DOI 10.29115/SP-2018-0004]
[9] Multiple instance learning: A survey of problem characteristics and applications
Carbonneau, Marc-Andre
Cheplygina, Veronika
Granger, Eric
Gagnon, Ghyslain
[J]. PATTERN RECOGNITION, 2018, 77 : 329 - 353
[10] Chen Gang, 2012, ARXIV12050610

← 1 2 3 4 5 6 →