Collecting Representative Social Media Samples from a Search Engine by Adaptive Query Generation

被引:0
作者
Landeiro, Virgile [1 ]
Culotta, Aron [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
来源
PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019) | 2019年
基金
美国国家科学基金会;
关键词
classification; data collection; sampling bias;
D O I
10.1145/3341161.3342924
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studies in computational social science often require collecting data about users via a search engine interface: a list of keywords is provided as a query to the interface and documents matching this query are returned. The validity of a study will hence critically depend on the representativeness of the data returned by the search engine. In this paper, we develop a multi-objective approach to build queries yielding documents that are both relevant to the study and representative of the larger population of documents. We then specify measures to evaluate the relevance and the representativeness of documents retrieved by a query system. Using these measures, we experiment on three real-world datasets and show that our method outperforms baselines commonly used to solve this data collection problem.
引用
收藏
页码:204 / 207
页数:4
相关论文
共 11 条
[1]  
[Anonymous], 2001, HELLINGER DISTANCE H
[2]  
[Anonymous], 2016, SOCIAL DATA BIASES M, DOI DOI 10.2139/SSRN.2886526
[3]  
Biemer Paul P, 2003, Introduction to survey quality, V335
[4]   A Warm Welcome Matters! The Link Between Social Feedback and Weight Loss in/r/loseit [J].
Cunha, Tiago O. ;
Weber, Ingmar ;
Pappa, Gisele .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :1063-1072
[5]   Online and Social Media Data As an Imperfect Continuous Panel Survey [J].
Diaz, Fernando ;
Gamon, Michael ;
Hofman, Jake M. ;
Kiciman, Emre ;
Rothschild, David .
PLOS ONE, 2016, 11 (01)
[6]   Active Learning by Querying Informative and Representative Examples [J].
Huang, Sheng-Jun ;
Jin, Rong ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (10) :1936-1949
[7]   A methodology for the evaluation of web graph models and a test case [J].
Kogias, Antonios ;
Anagnostopoulos, Dimosthenis .
PROCEEDINGS OF THE 2006 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2006, :2202-+
[8]  
Li S., 2012, P C EMPIRICAL METHOD, P139
[9]  
Liu P., 2018, FORECASTING PRESENCE, V1001, P48109
[10]   Distilling the Outcomes of Personal Experiences: A Propensity-scored Analysis of Social Media [J].
Olteanu, Alexandra ;
Varol, Onur ;
Kiciman, Emre .
CSCW'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, 2017, :370-386