Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources

被引:97
作者
Dong, Xin Luna [1 ]
Gabrilovich, Evgeniy [1 ]
Murphy, Kevin [1 ]
Dang, Van [1 ]
Horn, Wilko [1 ]
Lugaresi, Camillo [1 ]
Sun, Shaohua [1 ]
Zhang, Wei [1 ]
机构
[1] Google Inc, Palo Alto, CA 94301 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 09期
关键词
D O I
10.14778/2777598.2777603
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.
引用
收藏
页码:938 / 949
页数:12
相关论文
共 31 条
  • [11] Dong X. L., 2010, PVLDB
  • [12] Dong Xin Luna, 2014, PVLDB
  • [13] Etzioni Oren, 2011, IJCAI
  • [14] Galarraga L.A., 2013, P 22 INT C WORLD WID
  • [15] Gyngyi Z., 2014, VLDB, P576
  • [16] Kamvar S. D., 2003, WWW
  • [17] Kleinberg J. M., 1998, SODA
  • [18] Krishnan V., 2006, AIRWEB
  • [19] Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation
    Li, Qi
    Li, Yaliang
    Gao, Jing
    Zhao, Bo
    Fan, Wei
    Han, Jiawei
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1187 - 1198
  • [20] Li X., 2013, PVLDB, V6